compared with
Current by Leïla Medjkoune
on Dec 13, 2012 16:29.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (19)

View Page History
| *Title* \\ | _IS38_ (W)ARC to HBASE migration |
| *{-}Title{-}* \\ | _{-}IS38{-}_ -(W)ARC to HBASE migration- |
| *{-}Detailed description{-}* | _{-}Planned migration from (W)ARC content to a new infrastructure based on HBase{-}_ \\ |
| *{-}Scalability Challenge{-}* \\ | _{-}Around 200 TB of Web data need to be migrated and continuity of services need to be maintained.-_ \\ |
| *[Issue champion|SP:Responsibilities of the roles described on these pages]* | _[Leïla Medjkoune|https://portal.ait.ac.at/sites/Scape/Management/_layouts/userdisp.aspx?ID=69&Source=https%3A%2F%2Fportal.ait.ac.at%2Fsites%2FScape%2FManagement%2F_layouts%2Fpeople.aspx%3FMembershipGroupId%3D5]_ _(IM)_ |
| *-[Issue champion|SP:Responsibilities of the roles described on these pages]-* | _-[Leïla Medjkoune|https://portal.ait.ac.at/sites/Scape/Management/_layouts/userdisp.aspx?ID=69&Source=https%3A%2F%2Fportal.ait.ac.at%2Fsites%2FScape%2FManagement%2F_layouts%2Fpeople.aspx%3FMembershipGroupId%3D5]-_ _-(IM)-_ |
| *{-}Other interested parties{-}* \\ | -Comment from Bjarne (SB): Isn't this "just" about unpacking content from (W)ARC and putting it into HBase ? - I see no real need for Structural and visual comparison. All objects are going to be 100% the same as the original ? ?- |
| *Possible Solution approaches* | _UPMC Structural and visual comparison_ \\ |
| *{-}Possible Solution approaches{-}* | _{-}UPMC Structural and visual comparison{-}_ \\ |
| *{-}Context{-}* | -IM is migrating its web content, currently stored into (W)ARC files to a new infrastructure based on Hbase.- \\
-The archive contains around 200 TB of data and is growing rapidly. Most of the content crawled will need to be migrated sometimes this year.- \\
-Once the new infrastructure is ready, services provided to cultural institutions by IM will have to rely on this new infrastructure. The Foundation is currently providing a high-level quality archive and related services such as redirection from live missing content to the archive or resolution of access issues through its access tool.- \\ \\
\\
-Looking at the investment in term of manual quality assurance, crawl preparation and developments, it is impossible to get a lower quality after content is migrated to this new infrastructure.- \\ \\
\\
-We are therefore planning to build a “quality test” migration using tools and methodologies developed by UPMC to detect and repair migration defects as described in WP11 work description.- |
| *Lessons Learned* | \\ |
| *Training Needs* | \\ |
| *{-}Lessons Learned{-}* | \\ |
| *{-}Training Needs{-}* | \\ |
| *{-}Datasets{-}* | -[IM Web Archive |SP:Internet Memory Web Archive]-\\ |
| *Solutions* | \\ |