Classification of files within a disk image

Skip to end of metadata
Go to start of metadata
Title
Classification of files within a disk image
Detailed description The first stage of our digital preservation process is to create a disk image of any media we recieve. This gives us a single file which we contains all the contents and structures of the original media. These disk images can contain thousands of individual files.

As we ingest a disk we make a note of the basic genre, scope and content of the material on the disk (as well as other technical and descriptive metadata). This can be very time-consuming, so it would be helpful if there were some way to generate a list of keywords or subjects of text files within the disk image, so that we can get an overview of the material on a disk and what it relates to.
Issue champion Rebecca Nielsen
Other interested parties
Any other parties who are also interested in applying Issue Solutions to their Datasets
Possible Solution approaches Analysis of Lucene Index Word Frequency
Context Details of the institutional context to the Issue. (May be expanded at a later date)
Lessons Learned Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice
Datasets Disk Images
Solutions Reference to the appropriate Solution page(s), by hyperlink
Labels:
issue issue Delete
spruce_glasgow spruce_glasgow Delete
spruce spruce Delete
unsolved_issue unsolved_issue Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.