View Source

| *Title* \\ | Classification of files within a disk image |
| *Detailed description* | The first stage of our digital preservation process is to create a disk image of any media we recieve. This gives us a single file which we contains all the contents and structures of the original media. These disk images can contain thousands of individual files. \\
\\
As we ingest a disk we make a note of the basic genre, scope and content of the material on the disk (as well as other technical and descriptive metadata). This can be very time-consuming, so it would be helpful if there were some way to generate a list of keywords or subjects of text files within the disk image, so that we can get an overview of the material on a disk and what it relates to. \\ |
| *Issue champion* | [~rebeccan] |
| *Other interested parties* \\ | _Any other parties who are also interested in applying Issue Solutions to their Datasets_ |
| *Possible Solution approaches* | [AQuA:Analysis of Lucene Index Word Frequency]\\ |
| *Context* | _Details of the institutional context to the Issue. (May be expanded at a later date)_ \\ |
| *Lessons Learned* | _Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice_ \\ |
| *Datasets* | [SPR:Disk Images]\\ |
| *Solutions* | _Reference to the appropriate Solution page(s), by hyperlink_ |