Parallel processing of identification and characterisation jobs

Skip to end of metadata
Go to start of metadata

Here's some ideas and suggestions regarding parallel processing, focusing on running identification and characterisation jobs in parallel. Please contribute and comment! Maurice de Rooij / 30-01-2012

IMHO we should also give multi/parallel processing more attention. Most (non-Java) tools are not designed to benefit from modern multi-core architectures. Luckily this is easy to accomplish by using simple wrapper scripts. However there are potential pitfalls, such as race conditions, deadlocks and memory leaks.

A good example is the OPF format identification tool Fido which is a single process application. Although it is able to process around 200 files/second on most standard Linux distros, it should be possible to process that amount of files TIMES the number of processors you have available. On a Blade system with 12 cores FIDO should be able to process at least around 2000 files per second.

In order to prove and test this I will bring a prototype Python multi-process wrapper with me which could then be re-used for other purposes.

Also please note I explicitly do not mention multi threading, which is a different type of beast. Multi threading suffers from the same pitfalls as multi processing, possibly even more due to the fact that in most cases multiple threads share resources.

Maurice de Rooij / 22-01-2013

Maurice, dunno if you're interested, but this http://www.boddie.org.uk/python/pprocess.html might help. I also wonder if we can use something like ppss http://code.google.com/p/ppss/ ?

: Thanks, haven't seen these yet, been experimenting using the default Python multiprocess package, which is quite alright. Maurice de Rooij / 28-01-2013

You may also like GNU Parallel (http://www.gnu.org/software/parallel/), which makes simple multiprocessing somewhat more accessible. Andrew Jackson/2013-01-28

The Python Wiki is back after "an attack" - it has a very good list: http://wiki.python.org/moin/ParallelProcessing Peter Cliff/2013-01-28

Labels:
parallel parallel Delete
processing processing Delete
multi multi Delete
process process Delete
characterisation characterisation Delete
identification identification Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.