Title | Identification of file formats with incorrect file extensions |
Detailed description | Electronic documents and image files are deposited on a variety of media, including floppy disc, CD-R and memory sticks. In copying process, file extensions can be lost, or period marks in file names result in everything after period mark being read as a file extension, resulting in unreadable files because correct file association has been lost. Time consuming to identify where unreadable file extensions are genuine, but unusual file types, or are incorrect extensions. Time consuming, hit-and-miss process currently used to try and identify file types and access content, with potential impact on authenticity & integrity of files in the process. |
Issue champion | ![]() |
Other interested parties | ![]() ![]() |
Possible Solution approaches |
|
Context | Details of the institutional context to the Issue. (May be expanded at a later date) |
Lessons Learned | Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice |
Datasets | Seven Stories author & illustrator files |
Solutions | Tika Batch File Identification |
Labels: