Title
Moving records from Sharepoint to Eprints for preservation
Detailed description
It is not feasible to come up with a working solution to convert SP to ePrints within 3 days since SP is a very complex CMS.
In the end we created a new "view" through SP list administration containing all fields and exported this to Excel. This newly created view contained all desired fields, such as author, date and content body. The view is saved as Excel sheet, which is sufficient for further steps.
Before creating "Digital Preservation Ready" e-Prints it would be wise to adapt the HTML content to make it more semantic. Some articles use "Bold" (<b>) tags to indicate a header or subheader instead of using "real" HTML heading levels (H1...).
This also implicates that future content should be formated properly upon creation, this means the users need to be educated and made aware of this type of issue.
The exported body contents sometimes contain internal and external links. This could potentially cause trouble in the future if these resources become unavailable. An approach would be to download all linked resources, no matter what object type. External websites or single pages could be harvested using existing webharvesting solutions. Images could be saved through a webbrowser or automated script.
After harvesting the internal and both external resources, the appropriate preservation strategy for each object should be applied.
BASIC ROADMAP TO GET STARTED
- export custom view with desired fields to Excel
- determine per article if it is necessary to harvest any internal/external objects such as links, images, webpages, etc.
- spruce up HTML content bodies (add heading levels, etc to make it semantic), maybe convert to Word document?
- spruce up metadata (author, date, ...) to make it Eprints ready
- convert to Eprints
Solution Champion
Maurice de Rooij
Corresponding Issue(s)
Moving records from Sharepoint to Eprints for preservation
Tool/code link
There are some commercial and non-commercial packages around which are able to archive a SP site, but we haven't looked into them yet.
- http://www.sharearchiver.com/sharepoint-archiving.html
- http://vrearchive.codeplex.com/
- http://store.bamboosolutions.com/p-63-list-bulk-export.aspx
Evaluation
Any notes or links on how the solution performed.