View Source

{section}{column:width=50%}

h5. About these pages

The pages referenced below form a network of *Datasets*, preservation and curation *Issues* with those Datasets, and *Solutions* to those Issues. As such, these pages capture information and requirements about concrete digital preservation and curation challenges, that are present in specific datasets and collections. The experiences of solving these Issues are written up on Solution pages. These in turn link to pages in the [OPF Tool Registry|TR:Home], and to actual code that can be downloaded and re-used.

The purpose of these pages is to share experiences in solving preservation and curation problems, so we can learn from each other, and to articulate practitioners needs and requirements to those who are in a position to produce practical solutions to their problems.

h5. Support

The work collated on this page is supported by: [Open Preservation Foundation|http://openpreservation.org/], [Jisc|http://www.jisc.ac.uk/], [European Commission|http://cordis.europa.eu/fp7/home_en.html], [Digital Preservation Coalition|http://www.dpconline.org/], [SPRUCE Project|http://wiki.opf-labs.org/display/SPR/Home], [AQuA Project|http://wiki.opf-labs.org/display/AQuA/Home], [SCAPE Project|http://www.scape-project.eu/], and you\!
{column}
{column:width=50%}

{html}<a href="http://www.dcc.ac.uk/sites/default/files/documents/idcc13posters/Poster186.pdf"><img align="right" src="http://wiki.opf-labs.org/download/attachments/13764153/DCC+2012+poster.png"></a>{html}

h5. Practitioners need better characterisation tools

Analysis of the Datasets, Issues and Solutions collated on this page indicated a broad cross section of preservation requirements, but an overriding need for more effective characterisation. Practitioners need to understand more about their data and it's condition, typically for quality assurance, appraisal and assessment and for identifying preservation risks. This analysis and details of these conclusions are described in this poster, published at the 8th International Digital Curation Conference, Amsterdam, January 2013.

h5. Get involved

Anyone can contribute to these pages. All you need to do is&nbsp;[register for an OPF account|KB:Joining the OPF Labs site]&nbsp;(its quick, free, and anyone can do it), and then start adding comments, adding value to existing pages, or contributing new ones. Please help us make this a valuable resource for all\!
{column}{section}



{section}{column}

h1. Datasets

!dataset2.png|align=left!
These are the Datasets or collections that relate to specific preservation Issues which in turn (may) have Solutions developed for them. The Datasets are categorised by their media type. [Click this link to create a new Dataset, then edit the italicised text|http://wiki.opf-labs.org/pages/createpage-entervariables.action?spaceKey=REQ&templateId=8617991&fromPageId=8356148].
----
h4. Audio datasets

Label:&nbsp;[audio|http://wiki.opf-labs.org/label/audio]

{contentbylabel:labels=+dataset +audio|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Disk image datasets

Label:&nbsp;[disk_image|http://wiki.opf-labs.org/label/disk_image]

{contentbylabel:labels=+dataset +disk_image|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Document datasets

Label:&nbsp;[document|http://wiki.opf-labs.org/label/document]


{contentbylabel:labels=+dataset +document|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Email datasets

Label:&nbsp;[email|http://wiki.opf-labs.org/label/email]

{contentbylabel:labels=+dataset +email|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Geodatasets

Label:&nbsp;[geodatasets|http://wiki.opf-labs.org/label/geodatasets]

{contentbylabel:labels=+dataset +geodata|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}


h4. Image datasets

Label:&nbsp;[image|http://wiki.opf-labs.org/label/image]

{contentbylabel:labels=+dataset +image|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Mixed/Misc datasets

Label:&nbsp;[mixed_misc|http://wiki.opf-labs.org/label/mixed_misc]
{contentbylabel:labels=+dataset +mixed_misc|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Research datasets

Label:&nbsp;[researchdata|http://wiki.opf-labs.org/label/researchdata]

{contentbylabel:labels=+dataset +researchdata|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Software datasets

Label:&nbsp;[software|http://wiki.opf-labs.org/label/software]

{contentbylabel:labels=+dataset +software|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Web datasets

Label:&nbsp;[web|http://wiki.opf-labs.org/label/web]

{contentbylabel:labels=+dataset +web|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Video datasets

Label:&nbsp;[video|http://wiki.opf-labs.org/label/video]

{contentbylabel:labels=+dataset +video|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Other / untagged
{contentbylabel:labels=+dataset -image -document -audio -video -software -email -web -researchdata -disk_image -geodata -mixed_misc|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}
{column}
{column}

h1. Issues

!issue3.png|align=left!
These are the preservation or other business driven Issues that relate to a specific Dataset and may have one or more specific Solutions developed to solve them. The aim of an Issue page is to provide a detailed description of the preservation challenge and the requirements of the Issue Owner that will help to inform development of a Solution that solves the Issue.
[Click this link to create a new Issue, then edit the italicised text|http://wiki.opf-labs.org/pages/createpage-entervariables.action?spaceKey=REQ&templateId=8617992&fromPageId=8356152].
----
h4. Unsolved issues

Issues that do not have linked solutions. Why not suggest or contribute a solution?

Label: [unsolved_issue|http://wiki.opf-labs.org/label/unsolved_issue]
{contentbylabel:labels=+issue +unsolved_issue|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Appraisal and assessment issues

Issues related to appraising or assessing digital content as the first step in deciding how to proceed with preservation activities.

Label: [appraisal_assessment|http://wiki.opf-labs.org/label/appraisal_assessment]
{contentbylabel:labels=+issue +appraisal_assessment|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Bit rot issues

Issues related to Datasets that exhibit bit rot (files damaged by imperfect storage, failed write operations or software/processing errors) and require a Solution to identify, and if possible repair, problematic files.

Label: [bit_rot|http://wiki.opf-labs.org/label/bit_rot]
{contentbylabel:labels=+issue +bit_rot|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Conformance issues

Issues where Dataset content does not match a required profile, or needs to be checked or validated against a particular profile. These profiles are typically determined by an organisation's collection or preservation policy.

Label: [conformance|http://wiki.opf-labs.org/label/conformance]
{contentbylabel:labels=+issue +conformance|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Contextual issues

Issues related to the wider context of a particular Dataset.

Label: [context|http://wiki.opf-labs.org/label/context]
{contentbylabel:labels=+issue +context|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Data capture issues

Issues related to the capture, harvesting or extraction of data in order to facilitate effective preservation and access.

Label: [data_capture|http://wiki.opf-labs.org/label/data_capture]

{contentbylabel:labels=+issue +data_capture|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Duplication issues

Duplicated files can arise from a number of causes. Identical duplicates are relatively easy to detect. Similar duplicates (eg. one file processed from another, or the same item scanned on a different device) can require much more complicated Solutions.

Label: [duplication|http://wiki.opf-labs.org/label/duplication]
{contentbylabel:labels=+issue +duplication|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Embedded objects issues

Objects embedded within other objects (such as OLE, OOXML, PDF, ZIP) can pose identification, appraisal or risk assessment challenges.

Label: [embedded_objects|http://wiki.opf-labs.org/label/embedded_objects]

{contentbylabel:labels=+issue +embedded_objects|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}


h4. External dependency issues

Issues relating to digital objects that have dependencies on other objects or content on the web.

Label: [dependency|http://wiki.opf-labs.org/label/dependency]

{contentbylabel:labels=+issue +dependency|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Integrity issues

Issues relating to ensuring the integrity or fixity of Datasets.

Label: [integrity|http://wiki.opf-labs.org/label/integrity]
{contentbylabel:labels=+issue +integrity|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Obsolescence, preservation risk and business constraint issues

Issues that relate to the obsolescence of Datasets, preservation risk or business constraints placed on the way that Datasets are managed.

Label: [obsolescence|http://wiki.opf-labs.org/label/obsolescence]
{contentbylabel:labels=+issue +obsolescence|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Planning and management issues

Issues that relate to the general planning and management of digital preservation.

Label: [planning_management|http://wiki.opf-labs.org/label/planning_management]&nbsp;

{contentbylabel:labels=+issue +planning_management|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}


h4. Quality issues

Issues relating to Datasets containing quality issues caused by digitisation, processing or format migration.

Label: [qa|http://wiki.opf-labs.org/label/qa]
{contentbylabel:labels=+issue +qa|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Retention/disposal issues

Issues relating to the retention, disposal and/or deletion of digital objects.

Label:&nbsp;[retention|http://wiki.opf-labs.org/label/retention]
{contentbylabel:labels=+issue +retention|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Rights issues

Issues related to rights or permissions that cause difficulties in managing or preserving Datasets.

Label: [rights|http://wiki.opf-labs.org/label/rights]
{contentbylabel:labels=+issue +rights|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Structural relationship issues

Digital entities can be made up of a number of objects (eg. masters, services copies, metadata). Structural relationships are important to understand which objects are part of an entity and what they for.

Label: [structural_relationships|http://wiki.opf-labs.org/label/structural_relationships]
{contentbylabel:labels=+issue +structural_relationships|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. System obsolescence issues

Issues related to the obsolescence of software or other systems that manage Datasets.

Label: [system_obsolescence|http://wiki.opf-labs.org/label/system_obsolescence]
{contentbylabel:labels=+issue +system_obsolescence|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Unknown characteristics issues

Issues related to Datasets with unknown characteristics that are necessary for a preservation, management or other business need.

Label: [unknown_characteristics|http://wiki.opf-labs.org/label/unknown_characteristics]
{contentbylabel:labels=+issue +unknown_characteristics|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Unknown file formats issues

Datasets containing unknown file formats tend to pose a preservation risk and make management of them difficult.

Label: [unknown_file_formats|http://wiki.opf-labs.org/label/unknown_file_formats]
{contentbylabel:labels=+issue +unknown_file_formats|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Value and cost issues

Issues relating to the cost of Dataset management or the Value of the Dataset to its owners and users.

Label: [value_cost|http://wiki.opf-labs.org/label/value_cost]
{contentbylabel:labels=+issue +value_cost|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}


h4. Other / untagged

Issues that haven't been tagged with any of the labels listed in this column. This provides a useful mechanism for catching Issues that have not been tagged with sufficient detail, or identifying the need to add new labels to this page.
{contentbylabel:labels=+issue -appraisal_assessment -obsolescence -context -dependency -conformance -system_obsolescence -retention -rights -qa -unknown_characteristics -unknown_file_formats -integrity -embedded_objects -duplication -data_capture -bit_rot -planning_management -unsolved_issue -structural_relationships|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}


{column}
{column}

h4.


h1. Solutions

!solution2.png|align=left!
These are Solutions that address specific Issues encountered in particular Datasets. Solutions are typically quite specific to a particular Issue and Dataset but many will have a wider application. For details of tools utilised in a Solution, either follow links from individual Solution pages or see the [Tool Registry|TR:Digital Preservation Tool Registry].
[Click this link to create a new Solution, then edit the italicised text|http://wiki.opf-labs.org/pages/createpage-entervariables.action?spaceKey=REQ&templateId=8617994&fromPageId=8356159].
----
h4. Appraisal and assessment solutions

Solutions for assessing or appraising datasets.

Label:&nbsp;[appraisal_assessment|http://wiki.opf-labs.org/label/appraisal_assessment]
{contentbylabel:labels=+solution +appraisal_assessment|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Bit rot detection and repair solutions

Solutions for detecting and possibly repairing bit rot Issues in Datasets.

Label:&nbsp;[bit_rot_detection|http://wiki.opf-labs.org/label/bit_rot_detection]
{contentbylabel:labels=+solution +bit_rot_detection|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Characterisation solutions

Solutions for characterising content.

Label:&nbsp;[characterisation|http://wiki.opf-labs.org/label/characterisation]
{contentbylabel:labels=+solution +characterisation|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Data capture solutions

Solutions for capturing data from an external source, or imaging data from hand held media.

Label:&nbsp;[data_capture|http://wiki.opf-labs.org/label/data_capture]
{contentbylabel:labels=+solution +data_capture|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. De-duplication solutions

Solutions for detecting and managing duplicated digital objects or datasets.

Label:&nbsp;[de-duplication|http://wiki.opf-labs.org/label/de-duplication]
{contentbylabel:labels=+solution +de-duplication|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Embedded object solutions

Solutions for managing and preserving embedded digital objects.

Label:&nbsp;[embedded_objects|http://wiki.opf-labs.org/label/embedded_objects]
{contentbylabel:labels=+solution +embedded_objects|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Emulation solutions

Solutions utilising emulation or virtualisation technologies.

Label:&nbsp;[emulation|http://wiki.opf-labs.org/label/emulation]
{contentbylabel:labels=+solution +emulation|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. File format identification solutions

Solutions for identifying file formats.

Label:&nbsp;[identification|http://wiki.opf-labs.org/label/identification]
{contentbylabel:labels=+solution +identification|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Fixity solutions

Solutions for addressing integrity issues using approaches for generating and verifying fixity information such as manifests and checksums.

Label:&nbsp;[fixity|http://wiki.opf-labs.org/label/fixity]
{contentbylabel:labels=+solution +fixity|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Migration solutions

Solutions for migrating data from one format to another.

Label:&nbsp;[migration|http://wiki.opf-labs.org/label/migration]
{contentbylabel:labels=+solution +migration|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Miscellaneous solutions

Solutions for miscellaneous topics.

Label:&nbsp;[miscellaneous|http://wiki.opf-labs.org/label/miscellaneous]
{contentbylabel:labels=+solution +miscellaneous|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Quality assurance solutions

Solutions for assessing or identifying quality Issues in Datasets.

Label:&nbsp;[quality_assurance|http://wiki.opf-labs.org/label/quality_assurance]
{contentbylabel:labels=+solution +quality_assurance|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Rights managment solutions

Solutions for managing permissions and rights Issues.

Label:&nbsp;[rights|http://wiki.opf-labs.org/label/rights]
{contentbylabel:labels=+solution +rights|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Structural relationship solutions

Solutions for preserving or checking the structural relationships between digital objects belonging to a particular entity.

Label:&nbsp;[structural_relationships|http://wiki.opf-labs.org/label/structural_relationships]
{contentbylabel:labels=+solution +structural_relationships|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}

h4. Validation solutions

Solutions for validating the conformance of digital objects to file format specifications or institutional profiles.

Label:&nbsp;[validation|http://wiki.opf-labs.org/label/validation]
{contentbylabel:labels=+solution +validation|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}


h4. Other / untagged

Solutions that haven't been tagged with any of the labels listed in this column. This provides a useful mechanism for catching Issues that have not been tagged with sufficient detail, or identifying the need to add new labels to this page.
{contentbylabel:labels=+solution -appraisal_assessment -bit_rot_detection -characterisation -data_capture -de-duplication -embedded_objects -emulation -fixity -identification -miscellaneous -migration -quality_assurance -rights -structural_relationships -validation|showLabels=false|showSpace=false|max=999|sort=modified|reverse=true|restrict=@all}
{column}{column}
{column}
{section}
{recently-updated:spaces=@all|labels=dataset issue solution}