View Source

h2.


h2. Evaluator(s)

_Tomasz Hofmann (PSNC)_

h2. Evaluation points

The main goal of this evaluation was to execute analysis on the number of abnormal laboratory examination results for a given disease codes in a given period. The investigated period and list of ICD10 codes are the input parameters for analysis algorithm. Statistics were gathered using [PSNC Hadoop cluster|SP:PSNC Hadoop Platform] and the map-reduce approach. As the evaluation metric the [http://purl.org/DP/quality/measures#418][number of objects per second|http://purl.org/DP/quality/measures#418]  has been selected (the object is defined as a single HL7 file stored in HDSF). 

h5. Assessment of measurable points

|| Metric || Description || Metric baseline || Metric goal || _July 21, 2014 \[Test 1\]_ || {color:#000000}{*}{_}July 28, 2014 \[Test 2\]_{*}{color}\\ || {color:#000000}{*}{_}July 30, 2014 \[Test 3\]_{*}{color}\\ ||
| +[+number of objects per second+|http://purl.org/DP/quality/measures#418]+ | _number of HL7 files processed per second_ | _\-_ | _\-_ | 4.196 \[obj/s\] \\ | 4,761 \[obj/s\] | 4,979 \[obj/s\] |
_Note:_ _ *as an object we proposed to use one HL7 file_

_Metrics must be registered in the_ _[metrics catalogue|SP:Metrics catalogue]_

*Visualisation of results*

The chart below presents results of analysis for Test 2. Colours indicate different ICD10 disease codes. Test has been performed for patients who visited WCPT hospital between 1-01-2013 and 31-12-2013. Each column indicates the number of abnormal results in laboratory examinations for all patients. The ICD10 disease codes investigated in this analysis are as follows:
* A15.0 \- Tuberculosis of lung, confirmed by sputum microscopy with or without culture
* A15.1 \- Tuberculosis of lung, confirmed by culture only
* J85.1 \- Abscess of lung with pneumonia 

      !laboratoryChart.png|border=1!

*Additional information*

Table 1 presents processing time of the whole job per test. Tables 2 and 3 provide information on the execution time  and number of processed rows related to map and reduce tasks respectively. The execution times (and performance) for three tests are similar because regardless of the analysed period it is necessary to process all HL7 files stored in the cluster.



*Table 1. Overall statistics*
|| Parameter || Test 1 || Test 2 || Test 3 ||
| Analyzed period      \\ | 1.07.2012-1.07.2014 | 1.01.2013-31.12.2013       \\ | 1.01.2014-1.05.2014 |
| Processing time       \\ | 80 \[m\] | 71 \[m\] | 68 \[m\] |
*Table 2. Statistics for map task*
|| Parameter || Test 1 || Test 2 || Test 3 ||
| Processing time (for all records)    \\ | 80 \[m\] | 71 \[m\]    \\ | 68 \[m\] |
| Number of records    \\ | 20 141  | 20 285  | 20 315 \\ |
*Table 3. Statistics for reduce task*
|| Parameter || Test 1 || Test 2 || Test 3 ||
| Processing time (for all records)     \\ | 30 \[s\] | 26 \[s\]     \\ | 24 \[s\] |
| Number of records     \\ | \- | \- \\ | \- |



h2. Technical details

h5. Workflow

The experiment is composed of the following steps (accordingly to the MapReduce schema):
# the map task \[[laboratory.sh|https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/jobs-scripts/laboratory.sh]\]:
## for each HL7 file saved on HDFS do:
### parse document in order to find out abnormal laboratory results - count them and next add into the context the following pair: Key=icd10 code, Value=the count of the abnormal results 
# the reduce task \[[laboratory.sh|https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/jobs-scripts/laboratory.sh]\]:
## for each icd10 code accumulate the count of the abnormal results
## produce the result pair Key=icd10 code, Value=the number of abnormal results

# statistics are gathered by downloading and parsing log files \[[test.sh|https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/test.sh]\]

*Scripts used to execute evaluation*


[https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/jobs-scripts/laboratory.sh]

[https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/jobs-scripts/test.sh]

*Execution commands*

{noformat}
./laboratory.sh -admission 20090601 -destination ./test3/laboratory.png -discharge 20140710 -hospital wcpit -icd10s J85.1 -icd10s A15.0 -icd10s A15.1 -laboratory RDW -width 800 -height 600
./test.sh laboratory

where:
-admission : date of patient admission to hospital
-discharge : date of patient discharge from hospital
-destination : folder for hadoop job results (only one per job execution)
-width : width of the chart in pixels
-height : height of the chart in pixels
-icd10s : list of idc10 codes
-laboratory : laboratory examination code [example: RDW or other from hl7 files]
{noformat}{*}Important note:* please change the _\-destination_ for each job execution.

*Hadoop job*

[https://git.man.poznan.pl/stash/projects/SCAP/repos/mr-jobs/browse/epidemic-jobs/laboratory]