Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History


Tomasz Hofmann (PSNC)

Evaluation points

In this task the efficiency of hadoop job for gathering information about abnormal laboratory results for specified ICD10 code has been evaluated. As the metric the number of objects per second (the number of HL7 files parsed per second) is used. It is worth to note, that test 2 has been performed on local file system and the result is not comparable with results of other tests.

Assessment of measurable points
Metric Description Metric baseline Metric goal July 21, 2014 [Test 1] July 28, 2014 [Test 2]
number of objects per second the number of HL7 files parsed in one second - - 69.60 [obj/s]
66,17 [obj/s]

Note:  *as an object we proposed to use one HL7 file

Metrics must be registered in the metrics catalogue



  • A15.0 - Tuberculosis of lung, confirmed by sputum microscopy with or without culture
  • A15.1 - Tuberculosis of lung, confirmed by culture only
  • J85.1 - Abscess of lung with pneumonia

In tables 2-3 additional statistics are presented. Results in table 2. and 3. show times of execution map and reduce tasks. Overall satistics are presented in Table 1.

Table 1. Overall statistics

Parameter Test 1 Test 2
Analyzed period     
1.06.2013-1.07.2014 1.01.2013-31.12.2013      
Processing time      
196 [s] 284 [s]

Table 2. Statistics for map task

Parameter Test 1 Test 2
Processing time (for all records)   
196 [s] 284 [s]   
Number of records   
13 658 
18 827 

Table 3. Statistics for reduce task

Parameter Test 1 Test 2
Processing time (for all records)    
0,008 [s] 0,006 [s]    
Number of records    
- -

Raw log files

Technical details


The experiment is composed of the following steps (accordingly to the MapReduce schema):

  1. the map task []:
    1. for each HL7 file saved on HDFS do:
      1. parse document in order to find out abnormal laboratory results - count them and next add into the context the following pair: Key=icd10 code, Value=the count of the abnormal results 
  2. the reduce task []:
    1. for each icd10 code accumulate the count of the abnormal results
    2. produce the result pair Key=icd10 code, Value=the number of abnormal results
  1. statistics are gathered by downloading and parsing log files []

Scripts used to execute evaluation

Execution commands

./ -admission 20090601 -destination tomek/tmp/laboratory11 -discharge 20140710 -hospital wcpit -icd10s J85.1 -icd10s A15.0 -icd10s A15.1 -laboratory RDW -width 800 -height 600
./ laboratory

Important note: please change the -destination for each job execution.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.