The bigger the data, the less the influence of the job starting overhead. Very interesting to see that reflected on the own experiments - and to see the amount of the influence on performance.

*The 197 GB experiment described above, is an comparable equivalent* to a task we regularly also perform in a "traditional way" (as a java program running on a 4 core HT server). Usually this task takes around 6 hours and 45 minutes to complete in the traditional way. Compared to the 36 minutes of the above mentioned experiment, that is an excellent value for our "poor hardware cluster".