Label: webarchive

All content with label webarchive.
Related Labels: planning, hadoop, lsdr, representationinformation, characterisation, watch, identification, obsolescence, issue, tool, qa, formatprofile, arc, database, researchdata, unknown_characteristics, unknown_file_formats, dataset, scenario

Page: Austrian National Library - Web Archive (SCAPE)
Title \\ Austrian National Library Web Archive Description The Austrian National Library uses a representative datasets from their webarchive: \\ \ events selective crawls: during an event frequently harvested sites, e.g. EU election 2009, Olympia ...
Other labels: arc
Page: DeepArc (The Registry)
Summary Purpose Intended for preserving web sites from the backend, this is a databasetoXML curation tool. Homepage \\ http://deeparc.sourceforge.net/ Source Code Repository \\ http://source.repository.url/ License \\ The license, which should also be a tag Debian Package The Debian package, if any ...
Other labels: database, tool
Page: IS12 ARC to WARC migration (SCAPE)
Title \\ IS12 ARC to WARC migration \\ Detailed description Migration from ARC to WARC is desirable as the WARC archive is better suited for the future of web archiving.  Scalability Challenge \\ ARC and WARC are both container formats. At the present SB has ...
Other labels: qa, issue, planning, watch, obsolescence
Page: IS14 Diverse preservation risks in large archives with millions of objects (SCAPE)
Title \\ IS14 Diverse preservation risks in large archives with millions of objects Detailed description While we ingested millions of objects in the past, we expanded our knowledge about the risks of the objects. However, before we could make a decision ...
Other labels: characterisation, identification, issue, watch, obsolescence
Page: IS17 Characterisation of text-based formats (SCAPE)
Title \\ IS17 Characterisation of textbased formats Detailed description Problem: it is getting increasingly common that scientific journal articles (which are usually in PDF format) are accompanied by supplemental files. These are often research data, or software source code or scripts. In the majority of cases ...
Other labels: identification, lsdr, issue, unknown_file_formats
Page: IS25 Web Content Characterisation (SCAPE)
Title \\ IS25 Web Content Characterisation Detailed description \\ The issue with web content is mainly the fact that web archive data is very heterogeneous. Depending on the policy of the institution, data contains text documents in all kinds of text encoding, html content ...
Other labels: characterisation, identification, issue, obsolescence
Page: IS26 Dealing with difficult identification cases (SCAPE)
Title \\ Dealing with difficult identification cases \\ Detailed description Identification Requirements, Format Languages, Requirements and Difficult Cases. Mutants and wild types. Strains. See below for specific examples. Scalability Challenge \\ The solution must be able to identify and describe the large ...
Other labels: lsdr, identification, issue, unknown_file_formats
Page: IS41 Analyse huge text files containing information about a web archive (SCAPE)
Title \\ IS41 Analyse huge text files containing information about a web archive \\ Detailed description Some web archive produce information about the content of a web archive on a periodical basis. The result is sometimes stored as huge text files ...
Other labels: issue, hadoop, characterisation, unknown_characteristics
Page: IS5 Digital objects archive contains unidentified content (SCAPE)
Title \\ Digital objects archive contains unidentified content Detailed description From an archiving point of view, if there is no detailed information about the exact content of an archive, no preservation planning or any preservation actions can be undertaken. For example, if old ...
Other labels: characterisation, identification, issue, watch, obsolescence
Page: IS6 Determine render-ability of displayable web objects (SCAPE)
Title \\ Determine renderability of displayable web objects Detailed description To make a digital object renderable depends on standards, agreements, and understandings in interfaces and hardware, and there are strong interdependencies between these conditions. Because of these technical dependencies, the content of the web archive might not be renderable ...
Other labels: characterisation, issue, obsolescence