||Classification of files within a disk image|
|Detailed description|| The first stage of our digital preservation process is to create a disk image of any media we recieve. This gives us a single file which we contains all the contents and structures of the original media. These disk images can contain thousands of individual files.
As we ingest a disk we make a note of the basic genre, scope and content of the material on the disk (as well as other technical and descriptive metadata). This can be very time-consuming, so it would be helpful if there were some way to generate a list of keywords or subjects of text files within the disk image, so that we can get an overview of the material on a disk and what it relates to.
|Issue champion||Rebecca Nielsen|
| Other interested parties
||Any other parties who are also interested in applying Issue Solutions to their Datasets|
|Possible Solution approaches|| Analysis of Lucene Index Word Frequency
|Context|| Details of the institutional context to the Issue. (May be expanded at a later date)
|Lessons Learned|| Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice
|Datasets|| Disk Images
|Solutions||Reference to the appropriate Solution page(s), by hyperlink|
Skip to end of metadata Go to start of metadata