As part of the development of the preservation strategy there is also a need to map duplicates and check for authenticity and integrity of files. Part of this process is looking at checksumming. This involves using tools such as Fastsum:
and looking at how this information is stored and built into the business workflow.
'md5sum' could also be used through Cygwin command line on a Windows environment to provide (for free) checksums of files. This could be used in combination with 'find' and 'xargs' to generate a manifest.
It may also be worthwhile considering BagIt http://en.wikipedia.org/wiki/BagIt which enables a separation between the data to be manifested and the manifest (metadata) itself whilst maintaining a link between these files.
Other interested parties
Any other parties who are also interested in applying Issue Solutions to their Datasets.
Possible Solution approaches
- testing out preservation planning tools
- suggestions of what worked/what didn't work from other developers/practitioners
This is part of a wider attempt to develop a robust preservation strategy to a small but growing digital repository in an underfunded but large local authority. The dataset is a small sample from a larger collection of digital assets with a variety of file types, descriptive metadata etc.
Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice
Reference to the appropriate Solution page(s), by hyperlink.