Although migrating library applications to Cloud environment is not an easy task, many libraries are interested in using Cloud infrastructure services broadly across their businesses, whether is about a Public, Private or Hybrid Cloud. One of the migration expectations is the scalability of digital preservation architectures in Cloud environments. Nevertheless, deployment of digital library platform, in particular SCAPE, in large data centres or in Cloud Computing environments raises specific challenges: dynamic allocation of components to compute nodes, monitoring the platform, or quality of service just to name the most recurrent ones. Automated cluster provisioning and platform deployment brings hope on this way, although it requires the integration and extension of a variety of tools, such as:
- a node deployment system, like Cobbler that helps administrators to dynamically allocate the nodes: systems can be added and removed from the management of the node deployment and configuration management systems on the fly, both on bare-bones computing hardware and on virtualized computing resources,
- on-the-fly software deployment system based on customized Configuration Management System (Puppet) that allows the evolution of SCAPE software packages by providing “high-level” recipes describing the tools and relations between them. It enables dynamic allocation of SCAPE components to computing resources with minimal human intervention, providing, in this way, a more deterministic software deployment process. It ensures that software is deployed as expected by the developers, meeting all required expectations.
- a Monitoring/Quality measuring tools: the integrated Puppet Configuration Management system natively provides capabilities for integration with and deployment of the Nagiosmonitoring solution. This allows operators/administrators to provide a better QoS.
Next section presents SCAPE Cloud Deployment Toolkit, as name suggests, a set of tools that supports automatic deployment of SCAPE components on multiple Cloud environments, thus enabling portability of SCAPE software between clouds.
The Cloud Deployment Toolkit provides system administrators with a user-friendly GUI that uses an orchestration layer for deploying common SCAPE components and tools on virtual machines hosted on Eucalyptus or Amazon EC2 (for the moment these are the environments we tested, but we can extend this to other clouds supported by the orchestration layer). The role of the orchestration layer (that is Apache Libcloud) is to hide the differences between the above mentioned Cloud vendors (providers) such that these differences are nonexistent from the user's points of view and moreover to minimize or even eliminate the vendor lock-in. A major advantage from using multiple providers would be the increase in availability for the SCAPE platform and also the possibility of reducing costs by testing different scenarios on Eucalyptus, and switch to usage of Amazon EC2 only when necessary (e.g. if Eucalyptus is down or has performance problems).
Powered by Puppet and Puppet DB, the deployment of common SCAPE components and tools leaves human interaction out, such that the deployment will be the same on all nodes (which use the same Puppet templates) regardless of the selected Cloud vendor, thus reducing the overhead produced by debugging and testing the result of the deployment on different nodes. Concretely, the toolkit provides features that allows:
- auto-detecting the Eucalyptus environment,
- listing existing clusters,
- creating/deleting clusters and their respective nodes.
The PoC (Proof of Concept) demonstrates the operation of selected SCAPE Platform Components in Cloud Environments, focusing on Eucalyptus and Amazon EC2 and fostering their scalability for providing on demand computing capacity. More details, plus user guide on SCAPE Cloud Toolkit, one can find on Bitbucket. In order to orchestrate the deployment of different components we are using Puppet Configuration Management System customized to SCAPE needs. More details on modules used within the SCAPE project are given in this Bitbucket project. Below are Puppet recipes for most common components and tools of the SCAPE platform:
- Taverna server
- Tomcat server
- SCAPE packages: For now, following tools are integrated: jpylyzer pagelyzer and xcorrsound. Please check project home on Bitbucket for regular updates.
Each cluster (master node) includes the Puppet master, the PuppetDB together with Puppet recipes for the SCAPE Components. Clusters also include some predefined Puppet templates for the various components that are going to be installed on the various nodes. On cluster nodes one can install different components depending on the template selected when deploying the node.