compared with
Version 9 by Daniel Pop
on Mar 31, 2014 22:14.

This line was removed.
This word was removed. This word was added.
This line was added.

Changes (17)

View Page History
This page describes the installation of SCAPE platform in a High Performance Computation centre. The hardware platform is described [here|].

h5. Overview
h5. UVT Data Center services

Deployment of SCAPE platform in large data centres or in Cloud Computing environments raises specific challenges, such as dynamic allocation of components to compute nodes, monitoring the platform, quality of service etc. Automated cluster provisioning and platform deployment is achieved by integrating and/or extending a set of specific tools, such as: a Node Deployment system (Cobbler)
Based on the requirements of the project partners (BUT, FIZ etc), UVT provides a set of services aimed at providing support for the various sub-projects and activities within SCAPE. The services provided can be classified as:
* Execution services
* Customisation services
* Data storage services
* IaaS services.

* a node deployment system, such as [Cobbler|], helps administrators to dynamically allocate the nodes: systems can be added and removed from the management of the node deployment and configuration management systems on the fly, both on bare-bones computing hardware and on virtualized computing resources
* On the fly software deployment using the customized Configuration Management System ([Puppet|]): it allows the evolution of SCAPE software packages by providing “high-level” recipes describing the tools and relations between them. It enables dynamic allocation of SCAPE components to computing resources with minimal human intervention, providing, in this way, a more deterministic software deployment process. It ensures that software is deployed as expected by the developers, meeting all required expectations.
* Monitoring/Quality measures: the integrated Puppet Configuration Management system natively provides capabilities for integration with and deployment of the [Nagios |]monitoring solution. This allows operators/administrators to provide a better QoS.
h6. Execution Services

h5. Cloud Deployment Toolkit for SCAPE Platform
The execution services provided by UVT cover several facilities, including (but not restricted to): Batch Scheduling, MapReduce services, and QosCosGrid Compute API.

The toolkit aims to provide the software components and corresponding puppet modules for deploying critical SCAPE Components in Cloud Environments. The PoC (Proof of Concept) aims at demonstrating the operation of selected SCAPE Platform Components in Cloud Environments, focusing on Eucalyptus and Amazon EC2 and fostering their scalability for providing on demand computing capacity. The toolkit is composed of:
* Developing a GUI (web based portal) for the management of an SCAPE Platform deployment on Eucalyptus based clouds, and in the next stages on Amazon Web Services EC2
* Integrating Puppet and PuppetDB Rest API’s
* Abstracting EC2 and Eucalyptus API for providing an uniform programming environment, ensuring this way the portability
The Batch Scheduling service is based on the IBM ® LoadLeveler scheduling system. The service allows the SCAPE partners to use UVT’s computing facilities, featuring more the 40 x86 servers. This service also allows access to specialised resources like GPU computing nodes. The Batch Scheduling service is complemented by the QosCosGrid API developed by Application Department from PSNC. The QCG API allows integration with partners applications.

More details, plus user guide on SCAPE Cloud Toolkit, one can find on [Bitbucket|]. In order to orchestrate the deployment of different components we are using Puppet Configuration Management System customized to SCAPE needs. More details on modules used within the SCAPE project are given in this [Bitbucket project|]. Below are Puppet recipes for most common components and tools of the SCAPE platform:
Besides the two batch oriented services, UVT also provides access to Hadoop resources both as a SCAPE dedicated cluster (7 dedicated HP AMD64 servers) and on demand Hadoop clusters based on the [SCAPE Cloud Deployment toolkit|].

* [Taverna server|]
* [Tomcat server|]
* [SCAPE packages|]: For now, following tools are integrated: jpylyzer pagelyzer and xcorrsound. Please check project home on Bitbucket for regular updates.

Another tool aims to provide integration between the Hadoop Filesystem (HDFS) and more ‘classical’ products like FTP. It is an Apache Mina based FTP server for exposing HDFS filesystem to local/remote clients that lack HDFS capabilities. One of the main use cases of this tool is to facilitate data staging between legacies HPC systems and Hadoop based computing clusters. See [Bitbucket |]project for source code and installation instructions.
h7. Customisation Services

On top of providing the aforementioned execution services, UVT also provides support for customising existing services for the requirements of SCAPE Users. Examples include specific runtime configurations, both software and hardware. For instance, off-screen CUDA rendering support was provided specifically to BUT, allowing the execution of OpenGL applications on top of headless GPU systems.

h6. Data Storage Services

The Data Storage Services provide the project partners with storage space on UVT’s infrastructure. The storage services include both GPFS storage (available as FTP/SFTP) and Hadoop HDFS storage.

The HDFS storage service is accessible both directly from Hadoop jobs and also by means of the HDFS FTP Server developed by UVT in the frame of the SCAPE Project. The FTP servers allows remote manipulation of the HDFS filesystem from legacy applications or commodity file transfer utilities*.\*

h6. IaaS Services

As part of the SCAPE Project, UVT provides IaaS hosting services for SCAPE Consortium members. This hosting services include the ability of deploying Virtual Machines on top of the IaaS infrastructure, specifically the Eucalyptus middleware. For instance, FIZ team members use the IaaS hosting facility leveraging on VMs to host Fedora Directory development.

h5. PSNC Data Center services

Based on the [scenarios identified by WCPT|SP:Medical Dataset] it was agreed to implement and deploy several services at PSNC for integrating WCPT working environment. The following services are needed to execute specific scenarios at WCPT:
* DICOM Download server - this service is responsible for providing access to all anonymized DICOM files stored at PSNC. This service is necessary to execute scenario named [large-scale access at hospital|SP:Large scale access at hospital] because the working environment at WCPT needs stored DICOM files in order to present them for all interested WCPT users. It will be also used in the educational portal (related to [large scale access for educational purposes|SP:Large scale access for educational purposes] scenario) as a background service for accessing DICOM files. Project repository: []
* DICOM HDFS-enabled server - this service provides the possibility to upload anonymized DICOM files to the PSNC's HDFS cluster. It is necessary to execute scenario named [large-scale ingest of medical data|SP:Large scale ingest of medical data] as the working environment at WCPT needs to transfer anonymized DICOM files to PSNC Data Center. This service is also important building block for the scenario named [large scale access for educational purposes|SP:Large scale access for educational purposes] because the educational portal will provide on-line viewer of the stored DICOM files. Project repository: []
* HL7 HDFS-enabled gateway - this service provides the possibility to upload HL7 metadata files about patient's visits. It is necessary to execute scenarios named [large-scale ingest of medical data|SP:Large scale ingest of medical data] as the working environment at WCPT needs to transfer HL7 files to PSNC Data Center. This service is also important in the context of scenarios named [large scale access for educational purposes|SP:Large scale access for educational purposes] and [large scale analysis|SP:Large scale analysis] because in the former scenario HL7 needs to be presented on-line, while in the latter scenario HL7 files will be analyzed with dedicated Hadoop jobs. Project repository: []