Title |
Script and Programming Language File Identifications |
Description | A collection of 600 script files, from various sources, comprising C++, Javascript, CSS, Perl, PHP, Python. |
Licensing | |
Owner | Collection is owned by various organisations, and includes web-harvested content |
Dataset Location | Dataset currently restricted to use at Hackathon |
Collection Champion | Andrew Fetherston, The National Archives |
Issues brainstorm | Current identification of text/script files is not very strong using traditional digital file format tools, which has implications for digital preservation repositories, web archiving etc.
|
List of Issues | Need to consider issues of potential mis-identification of files due to presence of embedded content. Overheads of running additional file identification tools |