
| *Title* \\ | _IS12 ARC to WARC migration_ \\ |
| *Detailed description* | _Migration from ARC to WARC is desirable as the WARC archive is better suited for the future of web archiving. _ |
| *Scalability Challenge* \\ | _ARC and WARC are both container formats. At the present SB has around 200 TB of web content data that needs to be migrated._ |
| *[Issue champion|SP:Responsibilities of the roles described on these pages]* | _[Per Møldrup-Dalum|https://portal.ait.ac.at/sites/Scape/_layouts/userdisp.aspx?ID=72&Source=https%3A%2F%2Fportal%2Eait%2Eac%2Eat%2Fsites%2FScape%2F%5Flayouts%2Fpeople%2Easpx%3FMembershipGroupId%3D5]__ (SB)_ |
| *Other interested parties* \\ | _During the IIPC Preservation Working Group meeting of 6-10-2011 this topic was discussed and the group was updated on Scape activities by Barbara Sierman and Sven Schlarb. The BNF is preparing the ARC-WARC migration and is creating a mapping between the two formats. If this issue is taken up, it will be interesting to contact the IIPC Preservation Working Group via Clement Oury (BNF)_ [
[email protected]|mailto:
[email protected]]_ who is the chair of the PWG. The BNF also created a JHOVE2 module for the ARC tool, and IIPC is asked to fund the development of a JHOVE2 module for the WARC tool. This combination might be interesting for the scenario (update by Barbara Sierman)_ |
| *Possible Solution approaches* | KEEPS: \\
* For format convertion the following tools are available:
** warc-tools (this tool was not selected in D10.1, because a license was missing, but that will problably change in D10.2) \\
SB: \\
* Since we will never be able to afford to keep the old ARC-files we need to be very sure that the resulting WARCs correspond 100% to the original ARCs \\
Thus: We need a QA tool that checks record by record that the content is the same |
| *Context* | _TBD_ |
| *Lessons Learned* | |
| *Training Needs* | \\ |