Here's some ideas and suggestions regarding parallel processing, focusing on running identification and characterisation jobs in parallel. Please contribute and comment! Maurice de Rooij / 30-01-2012
IMHO we should also give multi/parallel processing more attention. Most (non-Java) tools are not designed to benefit from modern multi-core architectures. Luckily this is easy to accomplish by using simple wrapper scripts. However there are potential pitfalls, such as race conditions, deadlocks and memory leaks.
A good example is the OPF format identification tool Fido which is a single process application. Although it is able to process around 200 files/second on most standard Linux distros, it should be possible to process that amount of files TIMES the number of processors you have available. On a Blade system with 12 cores FIDO should be able to process at least around 2000 files per second.
In order to prove and test this I will bring a prototype Python multi-process wrapper with me which could then be re-used for other purposes.
Also please note I explicitly do not mention multi threading, which is a different type of beast. Multi threading suffers from the same pitfalls as multi processing, possibly even more due to the fact that in most cases multiple threads share resources.
Maurice de Rooij / 22-01-2013
: Thanks, haven't seen these yet, been experimenting using the default Python multiprocess package, which is quite alright. Maurice de Rooij / 28-01-2013
The Python Wiki is back after "an attack" - it has a very good list: http://wiki.python.org/moin/ParallelProcessing Peter Cliff/2013-01-28