Investigator(s)
Paweł Kominek, Michał Kozak, Aleksander Stroiński, Tomasz Parkoła
Dataset
The DICOM medical data for this experiment comes from the overall WCPT dataset described here: http://wiki.opf-labs.org/display/SP/WCPT+medical+dataset. For this particular experiment we envison data storage test of approx. 10GB of medical data, which is approx. the amount of data produced by the WCPT hospital in one day.
Purpose of this experiment
The main goal of this experiment is to evaluate how fast the data center facilities are able to process the amount of data produced by WCPT in one day. The evaluation should measure time of storage on HDFS cluster, storage on country-wide cloud storage and metadata ingest into HBase. The evaluation should be a real case scenario, therefore it is to be executed via the network connection between WCPT and PSNC, which is limited by the asynchronous link with 100Mbps throughput.
Platform
PSNC Hadoop Platform (http://wiki.opf-labs.org/display/SP/PSNC+Hadoop+Platform)
Workflow
The ingestion process which will be investigated in this experiment is composed of the following steps:
- Receive data from the WCPT endpoint and store them on HDFS
- Validate received data and extract necessary information (validate required DICOM tags, extract information from certain DICOM tags)
- Store extracted information from the DICOM tags into the HBase (this step is necessary for further processing)
- Rename DICOM file
- Store backup copy of the DICOM file in the cloud storage
Requirements and Policies
It is required that the experiment is executed in real case environment, it means that the experiment needs to involve WCPT and PSNC parts. It should be initiated from the WCPT environment, and the measurements should be done at PSNC side, where all of the investigated activities take place.
Evaluations
Links to results of the experiment using the evaluation template.