Moving records from Sharepoint to Eprints for preservation solution

Skip to end of metadata
Go to start of metadata

Title
Moving records from Sharepoint to Eprints for preservation

Detailed description
It is not feasible to come up with a working solution to convert SP to ePrints within 3 days since SP is a very complex CMS.

In the end we created a new "view" through SP list administration containing all fields and exported this to Excel. This newly created view contained all desired fields, such as author, date and content body. The view is saved as Excel sheet, which is sufficient for further steps.

Before creating "Digital Preservation Ready" e-Prints it would be wise to adapt the HTML content to make it more semantic. Some articles use "Bold" (<b>) tags to indicate a header or subheader instead of using "real" HTML heading levels (H1...).

This also implicates that future content should be formated properly upon creation, this means the users need to be educated and made aware of this type of issue.

The exported body contents sometimes contain internal and external links. This could potentially cause trouble in the future if these resources become unavailable. An approach would be to download all linked resources, no matter what object type. External websites or single pages could be harvested using existing webharvesting solutions. Images could be saved through a webbrowser or automated script.

After harvesting the internal and both external resources, the appropriate preservation strategy for each object should be applied.

BASIC ROADMAP TO GET STARTED

  • export custom view with desired fields to Excel
  • determine per article if it is necessary to harvest any internal/external objects such as links, images, webpages, etc.
  • spruce up HTML content bodies (add heading levels, etc to make it semantic), maybe convert to Word document?
  • spruce up metadata (author, date, ...) to make it Eprints ready
  • convert to Eprints

Solution Champion
Maurice de Rooij

Corresponding Issue(s)
Moving records from Sharepoint to Eprints for preservation

Tool/code link
There are some commercial and non-commercial packages around which are able to archive a SP site, but we haven't looked into them yet.

Evaluation
Any notes or links on how the solution performed.

Labels:
spruce_london spruce_london Delete
solution solution Delete
sharepoint sharepoint Delete
eprint eprint Delete
export export Delete
excel excel Delete
strategy strategy Delete
data_capture data_capture Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.