Bill Allombert on Mon, 19 Jun 2023 23:49:51 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Parallel Processing of subsets


On Mon, Jun 19, 2023 at 10:09:32PM +0200, Daniel Berger wrote:
> Hello,
> 
> I want to do computations on subsets of a given set, that is quite large
> (around 600+ elements). Since there are quite many subsets, I do not
> necessarily want to process them all, but still as many as possible given my
> computational resources, using parallel processing.
> 
> For smaller sets (~25 elements), i have successfully used parfor, iterating
> over the number of subsets and using vecextract to get to the subsets
> themselves.
> 
> However, I don't think this approach is optimal for two reasons:
> 
>  * Each computation only takes a very short amount of time (few ms), so
>    the added overhead wastes resources (or so I assume after having
>    read the resources on using thread in PARI/GP), I think it would be
>    better to split up the data into chunks and let each core compute a
>    bunch before returning and getting a new batch of data.
>  * It would be better for me to start with smaller subsets (like
>    forsubset works) and the above approach doesn't do this.
> 
> So my questions are: Is my assumption above correct?

I would say yes.

> What is the easiest and/or best way to do this in PARI/GP?

Functions like forvec/forsubset etc are nice but often inefficient because
you need to recompute everything from scratch. One trick to speed up
computation is to reuse part of previous computation, which require considering
subset in a suitable order (think Gray code)

An example: solve the knapsack problem:

V=[2,3,5]
S=7
forsubset(3,s,if(vecsum(vecextract(V,s))==S,print(s)));

This is nice, but this redo the summation for each set.

{
  for(s1=0,1,
    my(t1=if(s1,V[1],0));
    for(s2=0,1,
      my(t2=if(s2,t1+V[2],t1));
      for(s3=0,1,
        my(t3=if(s3,t2+V[3],t2));
        if(t3==S,print([s1,s2,s3])))))
}
this reduce the number of additions to perform from 16 to 6.

Cheers,
Bill