| *{-}Title{-}* \\ | _{-}IS38{-}_ -(W)ARC to HBASE migration- |
| *{-}Detailed description{-}* | _{-}Planned migration from (W)ARC content to a new infrastructure based on HBase{-}_ \\ |
| *{-}Scalability Challenge{-}* \\ | _{-}Around 200 TB of Web data need to be migrated and continuity of services need to be maintained.-_ \\ |
| *-[Issue champion|SP:Responsibilities of the roles described on these pages]-* | _-[Leïla Medjkoune|]-_ _-(IM)-_ |
| *{-}Other interested parties{-}* \\ | -Comment from Bjarne (SB): Isn't this "just" about unpacking content from (W)ARC and putting it into HBase ? - I see no real need for Structural and visual comparison. All objects are going to be 100% the same as the original ?- |
| *{-}Possible Solution approaches{-}* | _{-}UPMC Structural and visual comparison{-}_ \\ |
| *{-}Context{-}* | -IM is migrating its web content, currently stored into (W)ARC files to a new infrastructure based on Hbase.- \\
-The archive contains around 200 TB of data and is growing rapidly. Most of the content crawled will need to be migrated sometimes this year.- \\
-Once the new infrastructure is ready, services provided to cultural institutions by IM will have to rely on this new infrastructure. The Foundation is currently providing a high-level quality archive and related services such as redirection from live missing content to the archive or resolution of access issues through its access tool.- \\ \\
-Looking at the investment in term of manual quality assurance, crawl preparation and developments, it is impossible to get a lower quality after content is migrated to this new infrastructure.- \\ \\
-We are therefore planning to build a “quality test” migration using tools and methodologies developed by UPMC to detect and repair migration defects as described in WP11 work description.- |
| *{-}Datasets{-}* | -[IM Web Archive |SP:Internet Memory Web Archive]-\\ |
