
Evaluator(s)
Tomasz Hofmann (PSNC)
Evaluation points
In this task the efficiency of hadoop job for gathering information about abnormal laboratory results for specified ICD10 code has been evaluated. As the metric the number of objects per second (the number of HL7 files parsed per second) is used. It is worth to note, that test 2 has been performed on local file system and the result is not comparable with results of other tests.
Assessment of measurable points
Metric | Description | Metric baseline | Metric goal | July 21, 2014 [Test 1] | July 28, 2014 [Test 2] |
---|---|---|---|---|---|
number of objects per second![]() |
the number of HL7 files parsed in one second | - | - | 69.60 [obj/s] |
66,17 [obj/s] |
Note: *as an object we proposed to use one HL7 file
Metrics must be registered in the metrics catalogue
Table 1. Overall statistics
Parameter | Test 1 | Test 2 |
---|---|---|
Analyzed period |
1.06.2013-1.07.2014 | 1.01.2013-31.12.2013 |
Processing time |
196 [s] | 284 [s] |
Table 2. Statistics for map task
Parameter | Test 1 | Test 2 |
---|---|---|
Processing time (for all records) |
196 [s] | 284 [s] |
Number of records |
13 658 |
18 827 |
Table 3. Statistics for reduce task
Parameter | Test 1 | Test 2 |
---|---|---|
Processing time (for all records) |
0,008 [s] | 0,006 [s] |
Number of records |
- | - |
Raw log files
Technical details
Workflow
The experiment is composed of the following steps (accordingly to the MapReduce schema):
- the map task [laboratory.sh
]:
- for each HL7 file saved on HDFS do:
- parse document in order to find out abnormal laboratory results - count them and next add into the context the following pair: Key=icd10 code, Value=the count of the abnormal results
- for each HL7 file saved on HDFS do:
- the reduce task [laboratory.sh
]:
- for each icd10 code accumulate the count of the abnormal results
- produce the result pair Key=icd10 code, Value=the number of abnormal results
- statistics are gathered by downloading and parsing log files [test.sh
]
Scripts