python - No improvements using multiprocessing -

i tested performance of map, mp.dummy.pool.map , mp.pool.map

import itertools multiprocessing import pool multiprocessing.dummy import pool threadpool import numpy np # wrapper function def wrap(args): return args[0](*args[1:]) # make data arrays x = np.random.rand(30, 100000) y = np.random.rand(30, 100000) # map %timeit -n10 map(wrap, itertools.izip(itertools.repeat(np.correlate), x, y)) # mp.dummy.pool.map in range(2, 16, 2):     print 'thread pool ', i, ' : ',     t = threadpool(i)     %timeit -n10 t.map(wrap, itertools.izip(itertools.repeat(np.correlate), x, y))     t.close()     t.join() # mp.pool.map in range(2, 16, 2):     print 'process pool ', i, ' : ',     p = mp.pool(i)     %timeit -n10 p.map(wrap, itertools.izip(itertools.repeat(np.correlate), x, y))     p.close()     p.join()

the outputs

 # in case, 1 cpu core usage reaches 100%  10 loops, best of 3: 3.16 ms per loop   # in case, cpu core usages reach ~80%  thread pool   2  : 10 loops, best of 3: 4.03 ms per loop  thread pool   4  : 10 loops, best of 3: 3.3 ms per loop  thread pool   6  : 10 loops, best of 3: 3.16 ms per loop  thread pool   8  : 10 loops, best of 3: 4.48 ms per loop  thread pool  10  : 10 loops, best of 3: 4.19 ms per loop  thread pool  12  : 10 loops, best of 3: 4.03 ms per loop  thread pool  14  : 10 loops, best of 3: 4.61 ms per loop   # in case, cpu core usages reach 80-100%  process pool   2  : 10 loops, best of 3: 71.7 ms per loop  process pool   4  : 10 loops, best of 3: 128 ms per loop  process pool   6  : 10 loops, best of 3: 165 ms per loop  process pool   8  : 10 loops, best of 3: 145 ms per loop  process pool  10  : 10 loops, best of 3: 259 ms per loop  process pool  12  : 10 loops, best of 3: 176 ms per loop  process pool  14  : 10 loops, best of 3: 176 ms per loop

multi-threadings increase speed. it's acceptable due lock.
multi-processes slow down speed lot, surprising. have 8 3.78 mhz cpus, each 4 cores.

if inceases shape of x , y (300, 10000), i.e. 10 times larger, similar results can seen.

but small arrays (20, 1000),

 10 loops, best of 3: 28.9 µs per loop   thread pool  2  : 10 loops, best of 3: 429 µs per loop  thread pool  4  : 10 loops, best of 3: 632 µs per loop  ...   process pool  2  : 10 loops, best of 3: 525 µs per loop  process pool  4  : 10 loops, best of 3: 660 µs per loop  ...

multi-processing , multi-threading have similar performance.
the single process faster. (due overheads of multi-processing , multi-threading?)

anyhow, in excuting such simple function, it's out of expect multiprocessing performs bad. how can happen?

as suggested @trevormerrifield, modified code avoid passing big arrays wrap.

from multiprocessing import pool multiprocessing.dummy import pool threadpool import numpy np n = 30 m = 1000 # make data in wrap def wrap(i):     x = np.random.rand(m)     y = np.random.rand(m)     return np.correlate(x, y) # map print 'single process :', %timeit -n10 map(wrap, range(n)) # mp.dummy.pool.map print '---' print 'thread pool %2d : '%(4), t = threadpool(4) %timeit -n10 t.map(wrap, range(n)) t.close() t.join() print '---' # mp.pool.map, function must defined before making pool print 'process pool %2d : '%(4), p = pool(4) %timeit -n10 p.map(wrap, range(n)) p.close() p.join()

outputs

single process :10 loops, best of 3: 688 µs per loop  --- thread pool  4 : 10 loops, best of 3: 1.67 ms per loop  --- process pool  4 : 10 loops, best of 3: 854 µs per loop

no improvements.

i tried way, passing indice wrap data global arrays x , y.

from multiprocessing import pool multiprocessing.dummy import pool threadpool import numpy np # make data arrays n = 30 m = 10000 x = np.random.rand(n, m) y = np.random.rand(n, m) def wrap(i):   return np.correlate(x[i], y[i]) # map print 'single process :', %timeit -n10 map(wrap, range(n)) # mp.dummy.pool.map print '---' print 'thread pool %2d : '%(4), t = threadpool(4) %timeit -n10 t.map(wrap, range(n)) t.close() t.join() print '---' # mp.pool.map, function must defined before making pool print 'process pool %2d : '%(4), p = pool(4) %timeit -n10 p.map(wrap, range(n)) p.close() p.join()

outputs

single process :10 loops, best of 3: 133 µs per loop  --- thread pool  4 : 10 loops, best of 3: 2.23 ms per loop  --- process pool  4 : 10 loops, best of 3: 10.4 ms per loop

that's bad.....

i tried simple example (different wrap).

from multiprocessing import pool multiprocessing.dummy import pool threadpool # make data arrays n = 30 m = 10000 # no big arrays passed wrap def wrap(i):   return sum(range(i, i+m)) # map print 'single process :', %timeit -n10 map(wrap, range(n)) # mp.dummy.pool.map print '---' = 4 print 'thread pool %2d : '%(i), t = threadpool(i) %timeit -n10 t.map(wrap, range(n)) t.close() t.join() print '---' # mp.pool.map, function must defined before making pool print 'process pool %2d : '%(i), p = pool(i) %timeit -n10 p.map(wrap, range(n)) p.close() p.join()

the timgings:

 10 loops, best of 3: 4.28 ms per loop  ---  thread pool  4 : 10 loops, best of 3: 5.8 ms per loop  ---  process pool  4 : 10 loops, best of 3: 2.06 ms per loop

now multiprocessing faster.

but if changes m 10 times larger (i.e. 100000),

 single process :10 loops, best of 3: 48.2 ms per loop  ---  thread pool  4 : 10 loops, best of 3: 61.4 ms per loop  ---  process pool  4 : 10 loops, best of 3: 43.3 ms per loop

again, no improvements.

you mapping wrap (a, b, c), a function , b , c 100k element vectors. of data pickled when sent chosen process in pool, unpickled when reaches it. ensure processes have mutually exclusive access data.

your problem pickling process more expensive correlation. guideline want minimize amount of information sent between processes, , maximize amount of work each process does, while still being spread across # of cores on system.

how depends on actual problem you're trying solve. tweaking toy example vectors bit bigger (1 million elements) , randomly generated in wrap function, 2x speedup on single core, using process pool 4 elements. code looks this:

def wrap(a):     x = np.random.rand(1000000)     y = np.random.rand(1000000)     return np.correlate(x, y)  p = pool(4) p.map(wrap, range(30))

Trigger

Search This Blog

python - No improvements using multiprocessing -

anyhow, in excuting such simple function, it's out of expect multiprocessing performs bad. how can happen?

Comments

Post a Comment