Version 1 by Ivan Vujic
on May 13, 2014 10:23.

compared with
Version 2 by Ivan Vujic
on May 22, 2014 18:20.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (7)

View Page History

(The on-site server numbers were read from a bar chart in the earlier report and are approximate.)
| * * | *On-Site* \\
*Server* | *Azure*\\
*Server* | *Azure* \\
*VM* |
| *TIka* | 61 | 659 |
| *DROID* | 47 | 65 |
Table 1: Files per second

h1. Evaluation points


h5. Assessment of measurable points

|| Metric || Description || Metric baseline || Metric goal || _February 04, 2014_ || _May 21, 2014_ || _evaluation date_ ||
| NumberOfObjectsPerHour | Number of objects processed in one hour \\ | | | 2371809 | 2447712 | |
| MinObjectSizeHandledInGbytes | Smallest ARC file in sample \\ | | | 0.000000026\\ | 0.000000026\\ | |
| MaxObjectSizeHandledInGbytes | Biggest ARC file in sample \\ | | | 0.347421616 | 0.347421616\\ | |
| ThroughputGbytesPerMinute | The throughput of data measured in Gybtes per minute | | | 20.0 | 20.7 | |
| ThroughputGbytesPerHour | The throughput of data measured in Gbytes per hour | | | 1201.8 | 1240.2 | |
| ReliableAndStableAssessment | Manual assessment on if the experiment performed reliable and stable | | | true | true | |
| NumberOfFailedFiles | Number of files that failed in the workflow | | | N/A | 0 | |
| AverageRuntimePerItemInSeconds | The average processing time in seconds per item | | | 0.001517829 | 0.001470761 | |

h1. Interpretation

The huge difference in the results for the MD5 utility-\--more than an order of magnitude-\--indicates that a much faster file system was used on the Azure VM.  Indeed, the Govdocs1 files were put directly on the VM’s hard drive, whereas the files in the on-site study were mounted on a separate Network File System server accessed over a network.  If the files in the earlier study had been put directly on the server (which probably wasn’t possible), the on-site performance would likely have been much better.

(And it must be noted that if one of the other storage options for Azure had been used-\--BLOB storage or SQL Server database, for example-\--the Azure performance would likely have been much worse.)

This suggests that if a very large collections of documents needs to be identified quickly, the location of the files is important.