hermann on Fri, 30 Jun 2023 15:12:01 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Can PARI code be forced to run in 32MB L3 cache, mostly without RAM?


I read that perf "l2_cache_accesses_from_dc_misses,all_data_cache_accesses" can help to get understanding of L1 cache misses, and all(?) cache use?

So does below execution with PARI code say, that 3,372,491,167 times cache was accessed, and 37,845,884 times L1 had a cache miss?
How t measure L2 misses and L3 cache misses?

How to see how much RAM access was done?

What about parisize and pasisizemax settings for making PARI code run inside cache?


hermann@7600x:~/RSA_numbers_factored/c++$ LD_LIBRARY_PATH=/home/hermann/Downloads/pari-2.15.3/GPDIR/lib \
perf stat -e l2_cache_accesses_from_dc_misses,all_data_cache_accesses,cycles,task-clock \
./sqrtm1.smallest_known_1million_digit_prime
a = y^(-1) (mod p) [powm]; a *= x; a %= p
0.195755s
[M,V] = halfgcdii(sqrtm1, p)
0.217144s
[x,y] = [V[2], M[2,1]]
1e-06s
done

Performance counter stats for './sqrtm1.smallest_known_1million_digit_prime':

37,845,884 l2_cache_accesses_from_dc_misses # 79.826 M/sec
     3,372,491,167      all_data_cache_accesses   #    7.113 G/sec
     2,572,672,013      cycles                    #    5.426 GHz
474.10 msec task-clock # 0.999 CPUs utilized

       0.474457023 seconds time elapsed

       0.466410000 seconds user
       0.008041000 seconds sys


hermann@7600x:~/RSA_numbers_factored/c++$


Regards,

Hermann.

On 2023-06-29 19:58, hermann@stamm-wilbrandt.de wrote:
There are 31 hits for "cache", but none helped me to answer subject question:
https://pari.math.u-bordeaux.fr/pub/pari/manuals/2.15.1/libpari.pdf

If it is possible to restrict most part of computations to L3 cache,
does one need to restrict parisize and parisizemax to that value as
well?
Or lower?

My 7600X CPU has these caches:
https://github.com/Hermann-SW/7600X#details-of-pc
L1 Cache  384KB
L2 Cache  6MB
L3 Cache  32MB

Under Linux, how can I tell whether program runs mostly from cache and
not from RAM?


My only C++ PARI code sofar:
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/c%2B%2B/sqrtm1.smallest_known_1million_digit_prime.cc#L68-L93

I could reduce input prime size down from 1million digits for testing
to fit into L3 cache.


Regards,

Hermann.