Metadata extraction

Version 1 by Rachel MacGregor
on Sep 19, 2012 11:50.

compared with
Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (10)

View Page History
*Title*
_The Issue title or short description. This should be the same text as the title of the page\! Keep it short, but make it descriptive so that other people can get the gist of what your issue is about, just from the title._
*Metadata extraction*

*Detailed description*
_A detailed description of the Issue. The Issue_ *{_}MUST{_}* _focus on the busines or preservation driven challenge, and should not assume or prescribe a particular solution. Describe what the Issue is, why you have the Issue, and what your requirements are for a Solution. For example, it might be important that a technical solution runs on UNIX, or has a GUI front end and is easy to use, or needs to process records efficiently as the Dataset is large._
Creating a code for extracting metadata - and in particular descriptive metadata from files.

*Issue champion*
_Who owns the issue? Type "\[" and then your name to use autocomplete. _{_}This creates a link to your user profile on the wiki, including your contact details, so is the best way to add your name.._
The data set is representative of the whole digital repository.  We have identified that we can only manage the collections by conducting an audit of what is already there and mapping and improving the workflow.  One of the ways in which this can be done is by running a code to map what metadata already exists within the existing files.

This will assist with an audit of all existing files/data and help to identify and prioritise where work needs to be done on improving/adding metadata.

*Issues champion*

[~macgregorrachel]


*Other interested parties*

_Any other parties who are also interested in applying Issue Solutions to their Datasets._


*Context*
_Details of the institutional context to the Issue._


The dataset comes from a digital repository in a large local authority archives service and represents a mixture of scanned images (for which a physical original exists) and born digital images (for which a physical original does not exist).  There is a mixture of file formats and a some or no descriptive metadata in the files.  Within the digital repository as a whole there are more varied file formats and types.  Very little work has been done on managing the collections and the priority is to focus on an audit of what is already there and developing a strategy to manage and develop the collections in the future with a view to implementing robust digital preservation strategies.

*Lessons Learned*
_Notes on Lessons Learned from tackling this Issue that might be useful to inform digital preservation best practice_