i'm comparing lu decomposition/solve of eigen gsl, , find eigen on order of 2x slower -o3 optimizations on g++/osx. isolated timing of decompose , solve, find both substantially slower gsl counterparts. doing silly, or eigen not perform use case (e.g. small systems?) built eigen 3.2.8 , older gsl 1.15. test case contrived, mirrors results in nonlinear-fitting software i'm writing - eigen being anywhere 1.5x - 2x+ slower total lu/solve operation.
#define ndebug #include "sys/time.h" #include "gsl/gsl_linalg.h" #include <eigen/lu> // ax=b 3x3 system soln x=[8,2,3] // double avals_col[9] = { 4, 2, 2, 7, 5, 5, 7, 5, 9 }; // col major double avals_row[9] = { 4, 7, 7, 2, 5, 5, 2, 5, 9 }; // row major double bvals[9] = { 67, 41, 53 }; //----------- helpers void print_solution( double *x, int dim, char *which ) { printf( "%s solve x:\n", ); for( int j=0; j<3; j++ ) { printf( "%g ", x[j] ); } printf( "\n" ); } struct timeval tv; struct timezone tz; double timenow() { gettimeofday( &tv, &tz ); int _mils = tv.tv_usec/1000; int _secs = tv.tv_sec; return (double)_secs + ((double)_mils/1000.0); } //----------- void run_gsl( double *a, double *b, double *x, int dim, int reps ) { gsl_matrix_view gsla; gsl_vector_view gslb; gsl_vector_view gslx; gsl_permutation *gslp; int sign; gsla = gsl_matrix_view_array( a, dim, dim ); gslp = gsl_permutation_alloc( dim ); gslb = gsl_vector_view_array( b, dim ); gslx = gsl_vector_view_array( x, dim ); int err; double t, elapsed; t = timenow(); for( int i=0; i<reps; i++ ) { // gsl overwrites during decompose, must copy initial each time. memcpy( a, avals_row, sizeof(avals_row) ); err = gsl_linalg_lu_decomp( &gsla.matrix, gslp, &sign ); } elapsed = timenow() - t; printf( "gsl decompose (%dx) time = %g\n", reps, elapsed ); t = timenow(); for( int i=0; i<reps; i++ ) { err = gsl_linalg_lu_solve( &gsla.matrix, gslp, &gslb.vector, &gslx.vector ); } elapsed = timenow() - t; printf( "gsl solve (%dx) time = %g\n", reps, elapsed ); gsl_permutation_free( gslp ); } void run_eigen( double *a, double *b, double *x, int dim, int reps ) { eigen::partialpivlu<eigen::matrixxd> eigena_lu; eigen::map< eigen::matrix < double, eigen::dynamic, eigen::dynamic, eigen::colmajor > > ma( a, dim, dim ); eigen::map<eigen::matrixxd> mb( b, dim, 1 ); int err; double t, elapsed; t = timenow(); for( int i=0; i<reps; i++ ) { // memcpy not necessary eigen, not overwrite in // decompose, time directly comparable gsl. memcpy( a, avals_col, sizeof(avals_col) ); eigena_lu.compute( ma ); } elapsed = timenow() - t; printf( "eigen decompose (%dx) time = %g\n", reps, elapsed ); t = timenow(); eigen::vectorxd _x; for( int i=0; i<reps; i++ ) { _x = eigena_lu.solve( mb ); } elapsed = timenow() - t; printf( "eigen solve (%dx) time = %g\n", reps, elapsed ); // copy soln passed x for( int i=0; i<dim; i++ ) { x[i] = _x(i); } } int main() { // solve 3x3 system many times double a[9], b[3], x[3]; int dim = 3; int reps = 1000000; memcpy( b, bvals, sizeof(bvals) ); // init b vector, copied multiple times in run_gsl/run_eigen run_eigen( a, b, x, dim, reps ); print_solution( x, dim, "eigen" ); run_gsl( a, b, x, dim, reps ); print_solution( x, dim, "gsl" ); return 0; }
this produces, example:
eigen decompose (1000000x) time = 0.242 eigen solve (1000000x) time = 0.108 eigen solve x: 8 2 3 gsl decompose (1000000x) time = 0.049 gsl solve (1000000x) time = 0.075 gsl solve x: 8 2 3
your benchmark not fair doing copy of input matrix twice in eigen version: 1 manually through memcpy
, , 1 within partialpivlu
. let eigen knowns mb vector declaring map<eigen::vectord>
. (gcc5,-o3,eigen3.3):
eigen decompose (1000000x) time = 0.087 eigen solve (1000000x) time = 0.036 eigen solve x: 8 2 3 gsl decompose (1000000x) time = 0.032 gsl solve (1000000x) time = 0.062 gsl solve x: 8 2 3
moreover, eigen's partialpivlu not designed such extremely tiny matrices (see below). 3x3 matrices, better explicitly compute inverse (for matrices 4x4 usually, ok, not larger ones!). in case must fix sizes @ compile-time:
eigen::partialpivlu<eigen::matrix3d> eigena_lu; eigen::map<eigen::matrix3d> ma(avals_col); eigen::map<eigen::vector3d> mb(b); eigen::matrix3d inv; eigen::vector3d _x; double t, elapsed; t = timenow(); for( int i=0; i<reps; i++ ) { inv = ma.inverse(); } elapsed = timenow() - t; printf( "eigen decompose (%dx) time = %g\n", reps, elapsed ); t = timenow(); for( int i=0; i<reps; i++ ) { _x.noalias() = inv * mb; } elapsed = timenow() - t; printf( "eigen solve (%dx) time = %g\n", reps, elapsed );
which gives me:
eigen inverse , solve (1000000x) time = 0.0209999 eigen solve (1000000x) time = 0.000999928
so faster.
now if try larger problem, 3000 x 3000, more 1 order of magnitude of difference in favor of eigen:
eigen decompose (1x) time = 0.411 gsl decompose (1x) time = 6.073
this typically optimizations allows such performance large problems introduces overhead tiny matrices.
Comments
Post a Comment