Addons/math/mt/Benchmarking

From J Wiki
Jump to navigation Jump to search
User Guide | Installation | Development | Categories | Git | Build Log

Objective

Benchmark GEMM, TRSM and GESV methods from J primitives, mt addon, BLIS and OpenBLAS library wrappers in both single-threaded and multi-threaded environment.

Preparation

Prepare external libraries with methods to compare:

user@host:~/lib> ls -1
libblis_threads=1.so
libblis_threads=n.so
libopenblas_threads=1.so
libopenblas_threads=n.so

By handiwork

To estimate performance, a raw data from the test log can be used e.g.:

   load 'math/mt'
   mkmat=. _1 1 0 3 _6 4&gemat_mt_
   log=. mkmat testbasicmm_mt_ 2 # 1000
   'sts tms'=. 0 4 { log

In the code snippet above, various matrix-multiply methods were tested by random float 1000*1000 matrices in single-threaded environment. Sentences executed were saved into 2-rank string array sts (one sentence per row), and estimated execution durations were saved into tms vector (one atom per sentence):

   sts ; ,. tms
+-----------------+--------+
|(+/ .*)          |0.000345|
|mp               |0.000346|
|dgemmnn_mtbla_   |0.003074|
|...              |...     |
+-----------------+--------+

See log format in mt.ijs file. An execution duration for each sentence is estimated as proposed in [1]: "the minimum run-time of 3-5 executions of the program when the machine is lightly loaded.".

Having problem sizes given and execution durations produced, it's possible to compute any other indicators e.g. FLOPS or "duration per value".

By customized script

But developing a specialized code can make benchmarking process far more simple and convenient. Place the script File:Bmk.ijs into ~temp/bmk.ijs file and run it:

user@host:~/j9.6> BLIS_NUM_THREADS=8 ./jconsole.sh
   load '~temp/bmk.ijs'
   nn=. 100 liso4dhs_mt_ 100 60  NB. repeat for n=100..6000 with step 100
   bmk_mttmp_ nn
... (output is skipped)

This script's execution will result in creating 6 text files with numeric data (3 matrix methods * 2 thread modes (single/multi)) and 6 corresponding graph files (.pdf when was run within jconsole or .png when was run within Qt Jconsole):

user@host:~/j-user/temp> ls -1 bmk_*
bmk_GEMM_threads=1.dat
bmk_GEMM_threads=1.pdf
bmk_GEMM_threads=n.dat
bmk_GEMM_threads=n.pdf
bmk_GESV_threads=1.dat
bmk_GESV_threads=1.pdf
bmk_GESV_threads=n.dat
bmk_GESV_threads=n.pdf
bmk_TRSM_threads=1.dat
bmk_TRSM_threads=1.pdf
bmk_TRSM_threads=n.dat
bmk_TRSM_threads=n.pdf

Bmk GEMM threads=1.png Bmk GEMM threads=n.png Bmk TRSM threads=1.png Bmk TRSM threads=n.png Bmk GESV threads=1.png Bmk GESV threads=n.png

References

  1. Magne Haveraaen, Hogne Hundvebakke. Some Statistical Performance Estimation Techniques for Dynamic Machines. Appeared in Weihai Yu & al. (eds.): Norsk Informatikk-konferanse 2001, Tapir, Trondheim Norway 2001, pp. 176-185. URL: https://www.ii.uib.no/saga/papers/perfor-5d.pdf