Title |
Classification of files within a disk image |
Detailed description | The first stage of our digital preservation process is to create a disk image of any media we recieve. This gives us a single file which we contains all the contents and structures of the original media. These disk images can contain thousands of individual files. As we ingest a disk we make a note of the basic genre, scope and content of the material on the disk (as well as other technical and descriptive metadata). This can be very time-consuming, so it would be helpful if there were some way to generate a list of keywords or subjects of text files within the disk image, so that we can get an overview of the material on a disk and what it relates to. |
Issue champion | ![]() |
Other interested parties |
Any other parties who are also interested in applying Issue Solutions to their Datasets |
Possible Solution approaches | Analysis of Lucene Index Word Frequency |
Context | Details of the institutional context to the Issue. (May be expanded at a later date) |
Lessons Learned | Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice |
Datasets | Disk Images |
Solutions | Reference to the appropriate Solution page(s), by hyperlink |
Labels: