Loïc Grenié on Sun, 03 Dec 2023 15:31:36 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: PARI/GP pthread questions


On Sun Dec 3, 2023 at 15:10, Karim Belabas wrote:
* Bill Allombert [2023-12-03 13:15]:
[...]
> > 2)
> > Under 3.4.22 nbthreads
> > ...
> > * pthread: number of threads (unlimited, default: number of cores)
> > ...
> >
> > On 16C/32T AMD 7950X CPU I see 3200% CPU in top when starting
> > GP script with parforeach.
> >
> > So should above doc be corrected to
> >
> > "default: #cores * #threads_per_cor"
> >
> > or better to be correct for multi-CPU systems:
>
> When we wrote that, hyperthreading was only used in mainframes.
> It would be much better if the default was the number of cores instead of the number
> of hyperthreads.
> Unfortunately the GNU C library only report the number of CPU threads
> see getconf "_NPROCESSORS_CONF".
> I recommends to set nbthreads to the total number of cores.

Is there a difference with the simpler 'nproc' (from coreutils) ?

For systems with hyper-threading, one can get the number of physical cores
as follows on my laptop:

# grep '^cpu cores' /proc/cpuinfo | uniq
cpu cores       : 4

# getconf _NPROCESSORS_CONF
8

     The system calls are
openat(AT_FDCWD, "/sys/devices/system/cpu/possible", O_RDONLY|O_CLOEXEC) = 3
read(3, "0-7\n", 1024)                 = 4
 
# nproc
8

     The system call is
sched_getaffinity(0, 128, [0 1 2 3 4 5 6 7]) = 8

    Indeed taskset -c 1-5 nproc returns 5.

Maybe we could include the number of physical cores in the output of
Configure --mt=pthreads ? And add to INSTALL.tex that we advise to
override the default value of nbthreads in gprc.

      Contrary to Bill's experience, I've found that using hyperthreading
  improves (very slightly) running time on my computers (generic no-name
  laptops). In a similar vein, pinning the thread to a (virtual) core improves
  running time as well (not by much, either).
 
Not sure whether it would be wise to change the default in
  pthread.c:pari_mt_init().

     As far as I can tell, the optimum depends on the system and the
  computation.

         Best,

             Loïc