Loïc Grenié on Mon, 05 Jun 2023 22:32:33 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: parsum and nbthreads




Le lun. 5 juin 2023 à 17:54, Bill Allombert <Bill.Allombert@math.u-bordeaux.fr> a écrit :
On Mon, Jun 05, 2023 at 03:47:11PM +0200, Loďc Grenié wrote:
>     I've pushed a "loic-parsum" to pari git. It does not change the number
> of
>   threads (I still think it's not optimal right now), however it addresses
> most
>   of the problems I've illustrated before, should not hurt performance, as
> far
>   as I can tell, and passes the tests involving parsum (export, parallel,
>   programming).
>
>     The drawback is that it exports one more function (+1 line in paripriv,
> and
>   +7 non-empty lines in src/functions/programming/parsum) and substitutes
>   two functions by a longer one (+14 non-empty lines in
> src/language/eval.c).
>
>        Could you consider it for inclusion, eventually modified?

Sure. Do you have some tests where it makes a difference ?

     Sure.

     My gp here is configured as follows:
parisizemax = 2000003072, primelimit = 500000, nbthreads = 16
? default(threadsizemax,)
%1 = 1000000000

    With a7bed2c7d7 (patch not applied):

? parsum(a=1,1000000000,a)
cpu time = 1min, 1,997 ms, real time = 23,720 ms.
%1 = 500000000500000000
? parsum(a=1,10000,print(a);setrand(a);matrix(10^3,2*10^2,i,j,random(100)))
(lots of "increasing stack size")
cpu time = 4min, 17,486 ms, real time = 47,195 ms.
%2 = cancelled
? parsum(a=1,1000,setrand(a);matrix(10^4,10^3,i,j,random(100)))
(lots of "increasing stack size")
  ***   at top-level: parsum(a=1,1000,print(a);setrand(a);matrix(10^
  ***                 ^----------------------------------------------
  *** parsum: the thread stack overflows !
  current stack size: 1000000000 (953.674 Mbytes)
  [hint] you can increase 'threadsizemax' using default()

  ***   Break loop: type 'break' to go back to GP prompt

   With 54a9a89f9f (patch applied)
? parsum(a=1,1000000000,a)
cpu time = 1min, 54,142 ms, real time = 7,633 ms.
%1 = 500000000500000000
? parsum(a=1,10000,setrand(a);matrix(10^3,2*10^2,i,j,random(100)))
cpu time = 8min, 7,098 ms, real time = 33,973 ms.
%2 = cancelled
? parsum(a=1,1000,setrand(a);matrix(10^4,10^3,i,j,random(100)))
(lots of "increasing stack size")
cpu time = 53min, 8,210 ms, real time = 3min, 45,107 ms.
%3 = cancelled

    The first sum has lots of easy elements. The copying and central summing
  takes most of the time, and ultimately prevents parallelism on the unpatched
  pari (there are less than 3 threads used, in mean).
  The second has a reasonable number of moderately large objects. The parallelism
  is not very good on the unpatched pari (less than 5.5 threads used in mean).
  The third has too large objects.

     In each case, I can do better with parfor. However, since parsum exists, I expect
  that it performs relatively well in relatively reasonable situations (it's clear that if
  I sum objects that all fall in a single thread, then parallelism will be bad, however
  this is not the situation I'm presenting here: all the threads are roughly equal and
  the objects are not awful: either very simple or relatively large).

      I still think that the number of threads should also be modified for small
  number of objects (if I want to compute
  parsum(a=1,20,bnfinit(x^(a+50)-2).reg), I fail to understand why I should do
  it using vecsum(parvector) -- even though it works).

        Best,

             Loïc