View Source

FITS profiling: [~gmcgath]

To see where the performance issues are in FITS, I ran OpenFits with JIP, a free Java profiling tool. I had to exclude a number of third-party packages from tabulation before I could get it to run without exhausting memory.

The first successful run was on a TIFF file (grayscale_8bpp_wrong_bpptag.tif from the OPF format corpus). Each thread is reported separately, but totals per method are provided in a summary.

In this case, the most expensive methods reported were:

 Count     Time    Pct  Location
 =====     ====    ===  ========
&nbsp;&nbsp; 249&nbsp;&nbsp; 5070.0&nbsp; 203.2&nbsp; org.apache.xerces.util.URI:<init>
&nbsp;&nbsp;&nbsp; 10&nbsp;&nbsp; 3187.1&nbsp; 182.3&nbsp; edu.harvard.hul.ois.fits.tools.ToolOutput:validateXmlOutput
&nbsp;&nbsp;&nbsp;&nbsp; 4&nbsp;&nbsp; 3142.6&nbsp;&nbsp; 70.6&nbsp; uk.gov.nationalarchives.droid.signatureFile.SubSequence:isFoundAfterFileMarker
&nbsp;&nbsp;&nbsp;&nbsp; 6&nbsp;&nbsp; 1168.7&nbsp;&nbsp; 55.0&nbsp; edu.harvard.hul.ois.fits.tools.ToolBase:transform
153246&nbsp;&nbsp; 1048.4&nbsp;&nbsp; 39.8&nbsp; com.twmacinta.util.MD5:Transform
&nbsp;&nbsp;&nbsp;&nbsp; 7&nbsp;&nbsp;&nbsp; 966.4&nbsp;&nbsp; 27.1&nbsp; edu.harvard.hul.ois.fits.tools.utils.CommandLine:exec
&nbsp;&nbsp;&nbsp; 14&nbsp;&nbsp;&nbsp; 858.9&nbsp; 1400.0&nbsp; edu.harvard.hul.ois.fits.tools.utils.StreamGobbler:run
&nbsp;&nbsp; &nbsp;
I don't know what "Pct" means, since it's often way above 100%, but the count and time totals are more useful. All times are in milliseconds.

validateXmlOutput subsumes the actual calls to the tools, and each call to it invokes a URI initialization. In spite of this, less total time is spent on it than on URI initializations. (This is possible because there are other URI initializations not called under validateXmlOutput).

isFoundAfterFileMarker is specific to Droid, and I don't know what to make of it. CommandLine.exec and StreamGobble.run are high-level functions, so the fact that the consume a lot of time doesn't tell us anything.

We can also look at the time spent for each tool. These are:
1831.4&nbsp;&nbsp; &nbsp;AudioInfo
3222.4&nbsp;&nbsp; &nbsp;Jhove
1790.2&nbsp; ADLTool
2821.6&nbsp; Exiftool
2642.2&nbsp; FileUtility
4450.8&nbsp; Droid
2631.4&nbsp; FileInfo
2773.8&nbsp; MetadataExtractor (NLNZ)
1660.7&nbsp; FFIdent
1538.0&nbsp; XmlMetadata

The initialization of org.apache.xerces.util.URI is surprisingly expensive. It's called a lot of times and takes a fair amount of time per call.


Next I ran FITS on a directory containing five WAVE files. There are now a lot of threads, since a thread is invoked for each use of each tool on a file. The most expensive methods now are:

&nbsp;Count&nbsp;&nbsp;&nbsp;&nbsp; Time&nbsp;&nbsp;&nbsp; Pct&nbsp; Location
&nbsp;=====&nbsp;&nbsp;&nbsp;&nbsp; ====&nbsp;&nbsp;&nbsp; ===&nbsp; ========
&nbsp; 1080&nbsp; 15612.7&nbsp; 1549.3&nbsp; org.apache.xerces.util.URI:<init>
&nbsp;25752&nbsp; 14017.0&nbsp; 144.6&nbsp; uk.gov.nationalarchives.droid.signatureFile.SubSequence:endBytePosForSeqFrag
2815767&nbsp;&nbsp; 9155.1&nbsp;&nbsp; 92.1&nbsp; uk.gov.nationalarchives.droid.signatureFile.ByteSeqSpecifier:matchesByteSequence
&nbsp;&nbsp;&nbsp; 47&nbsp;&nbsp; 5097.9&nbsp; 406.5&nbsp; edu.harvard.hul.ois.fits.tools.ToolOutput:validateXmlOutput
&nbsp;&nbsp;&nbsp; 19&nbsp;&nbsp; 2527.2&nbsp; 182.0&nbsp; edu.harvard.hul.ois.fits.tools.utils.CommandLine:exec
&nbsp;&nbsp;&nbsp; 38&nbsp;&nbsp; 1724.4&nbsp; 3800.0&nbsp; edu.harvard.hul.ois.fits.tools.utils.StreamGobbler:run

Again URI initialization is the biggest single factor.

This suggested that possible that some efficiency could be gained by doing more static initialization where URIs are involved. I went for an easy fix and pulled out all Namespace initializers involving fixed URIs into private static declarations.

After this, the most expensive method summary gave:

Count&nbsp;&nbsp;&nbsp;&nbsp; Time&nbsp;&nbsp;&nbsp; Pct&nbsp; Location
&nbsp;=====&nbsp;&nbsp;&nbsp;&nbsp; ====&nbsp;&nbsp;&nbsp; ===&nbsp; ========
&nbsp;25752&nbsp; 14460.4&nbsp; 146.4&nbsp; uk.gov.nationalarchives.droid.signatureFile.SubSequence:endBytePosForSeqFrag
&nbsp; 1080&nbsp; 13142.4&nbsp; 1411.5&nbsp; org.apache.xerces.util.URI:<init>
2815767&nbsp;&nbsp; 9069.2&nbsp;&nbsp; 91.9&nbsp; uk.gov.nationalarchives.droid.signatureFile.ByteSeqSpecifier:matchesByteSequence
&nbsp;&nbsp;&nbsp; 47&nbsp;&nbsp; 5389.4&nbsp; 409.3&nbsp; edu.harvard.hul.ois.fits.tools.ToolOutput:validateXmlOutput
&nbsp;&nbsp;&nbsp; 38&nbsp;&nbsp; 3052.4&nbsp; 3800.0&nbsp; edu.harvard.hul.ois.fits.tools.utils.StreamGobbler:run
&nbsp;&nbsp;&nbsp; 19&nbsp;&nbsp; 2862.8&nbsp; 215.4&nbsp; edu.harvard.hul.ois.fits.tools.utils.CommandLine:exec

This is about a 15% saving on the URI init, pushing it down to second place. All the other calls took longer; I don't think factoring out the namespaces should have slowed them down, so some external factor may just have dragged down this run by a bit.

This isn't a huge gain, but the coding effort for it was very slight.

Each tool in the directory ran five times. Here are the time totals (after optimization of initializers):
AudioInfo: 1928.4, 296.2, 373.1, 488.5, 244.0
Jhove: 2213.1, 1147.5, 118.5, 210.7, 74.2
ADLTool: 1805.4, 299.9, 586.4, 455.9, 310.8
Exiftool: 2185.7, 1208.6, 1016.7, 836.5, 853.0
FileUtility: 2137.4, 1140.9, 1033.9, 990.5, 848.0
Droid: 1929.6, 1193.8, 1184.9, 20974.4, 6727.1
FileInfo: 1876.6, 328.3, 553.2, 591.4, 504.7
MetadataExtractor: 2212.6, 841.7, 686.2, 524.0, 549.3
FFIdent: 1726.0, 466.7, 509.4, 454.6, 257.5
XmlMetadata: 1803.9, 368.3, 234.7, 370.8, 293.7

There is significant overhead for the first run of all the tools. A significant part of this is certainly due to the class loader.

Here are the corresponding totals before optimization of initializers:
AudioInfo: 2004.5, 197.5, 181.2, 358.1, 381.4
Jhove: 2673.7, 1330.0, 126.6, 90.5, 202.5
ADLTool: 2004.5, 387.3, 171.0, 200.0, 352.7
Exiftool: 2603.2, 1270.4, 903.1, 948.7, 938.8
FileUtility: 2536.5, 1231.7, 728.2, 843.2, 899.5
Droid: 2279.1, 1365.2, 930.3, 20582.1, 6782.7
FileInfo: 1779.8, 440.0, 407.3, 507.1, 478.6
MetadataExtractor: 2588.9, 743.6, 406.1, 594.2, 512.4
FFIdent: 2242.5, 492.7, 362.2, 487.8, 329.9
XmlMetadata: 1696.1, 380.3, 318.0, 409.7, 441.6

Some proper analysis should be done on these numbers, but just from a glance, it appears that the run with optimized initializers has a greater relative saving of time, suggesting that the benefits of the optimization would increase with longer runs.