Ilya Zakharevich on Thu, 18 Jan 2024 08:41:07 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Implementation of forprime() and unextprime()


On Mon, Jan 15, 2024 at 07:21:29PM +0100, Bill Allombert wrote:
> On Thu, Jan 11, 2024 at 07:09:51PM -0800, Ilya Zakharevich wrote:
> >   B) unextprime() (the ulong version of nextprime() from
> >      basemath/ifactor1.c) has a rather convoluted code to sieve mod
> >      210.  Why not:
> > 
> >       • just store tables with a bitmap of a few (say 8) next odd
> >       	residues mod N which are mutually prime with N.  (Indexed by
> >       	“an odd” residue mod N.)
> >       • For a given odd number, extract the corresponding bitmaps from
> >       	the tables for a few values of N (such as 210, 11*19, 13*17, 23 and 29).
> >       • “bit-AND” these bitmaps.
> >       • Use the count of trailing/leading 0 bits (should be cheap) to
> >       	jump ahead.
> > 
> >      I would not be surprised if this can speedup the code by 20%
> >      (maybe even a bit more for numbers closer to 2^64).

> I have made a branch bill-unextprime with a simpler algorithm,
> but the speed is exactly the same, which means that all the time is
> spent in uisprime.

Sorry that I was not clear enough:

   The purpose of the suggestion above is
     • not to speed up sieving mod 210, but
     • to significantly (5–6 orders of magnitude?!) slow it down.

Currently, unextprime() is called every 4.375 integers (on average).
My (quite optimistic) back-of-the-envelop calculations show that

  • One can call unextprime() about 5.64 times more rarely.
  • Due to slow-down of the sieving, this would lead to ∼5 times
    speedup¹⁾ of this case of forprime().

      ¹⁾ With sieving called very rarely: about once per 19000 calls
      	 to unextprime().

For this (all numbers approximate based on rough calculations with
experimental data about timing on my processor):

  • One should partially-sieve a range of 480K ulong⸣s using primes up
    to about 950000. 
  • The bitmap for sieving should be placed so that it does not
    shadow-mod-32K the first 2K of the table of differences of primes.

     (Moreover, this speedup of ∼5 times seems to be quite robust: I
      got it with 2 different models of slowdowns of a processor.  It
      may have ”some mathematical origin”…)

(I hope to find time to write down these back-of-the-envelop
calculations…)

Hope this helps,
Ilya