a small database (one machine n1) , large database(on machine n2, billion records) need joined. app server need read data db servers memory. should read small db first ? , read second db ?
how can join executed fastest ? how done in real life in general ?
generally, should try push processing database. maybe big database server can pull small 1 local , process on server.
if want process in application common , optimal strategy perform hash join. convert small data set hash table. then, can probe items big data set against hash table. requires little memory, little cpu , can stream big data set.
this strategy works if join condition equality (e.g. orders.customerid = customers.id
) , 1 of 2 sets small enough fit in memory.
Comments
Post a Comment