View Source

h2. Evaluator(s)

_Tomasz Hofmann (PSNC)_

h2. Evaluation points

The main goal of this evaluation was to execute analysis on the medical data stored at Medical Data Center and obtain statistics on the number of medical cases related to a given ICD10 code in a given period. The analysed period of time is additionally split into a given number of sub-periods. The analysed period, number of sub-periods and the ICD10 code are the input parameters for analysis algorithm. Statistics were gathered using [PSNC+Hadoop+Platform|http://wiki.opf-labs.org/display/SP/PSNC+Hadoop+Platform&nbsp] and the map-reduce approach. As the metric the +[+number of objects per second+|http://purl.org/DP/quality/measures#418]+ has been used (the number of records processed per second). 

h5. Assessment of measurable points

|| Metric || Description || Metric baseline || Metric goal || _July 21, 2014 \[Test 1\]_ || July 28, 2014 \[Test 2\] || July 30, 20144 \[Test 3\] ||
| +[+number of objects per second+|http://purl.org/DP/quality/measures#418]+ | _number of records processed per second_ \\ | _\-_ | _\-_ | 2563 \[obj/s\] \\ | 3731 \[obj/s\] \\ | 2041 \[obj/s\] |
_Note:_ _ *as an object we proposed to use one scanned cell in Hbase table (one record)_

_Metrics must be registered in the_ _[metrics catalogue|SP:Metrics catalogue]_

*Visualisation of results*

The chart below presents results of analysis for Test 2. Colours indicate different sub-periods of time. Test has been performed for the patients who visited WCPT hospital between 1-01-2013 and 31-12-2013. This period is split into 5 sub-periods as seen on the chart below (each sub-period corresponds to one column). Each column indicates the number of patients visits for a given ICD10 code in a given sub-period. The ICD10 code in this test was set to J85.1 \- Abscess of lung with pneumonia. The total number of cases found in a given period is presented on the chart as well (it is 33 in this particular case).

    !casecounterChart.png|border=1!

Table 1 presents processing time of the whole job per test. Tables 2 and 3 provide information on the execution time  and number of processed rows related to map and reduce tasks respectively.



*Table 1. Overall statistics*
|| Parameter || Test 1 || Test 2 || Test 3 ||
| Analyzed period       \\ | 1.07.2012-1.07.2014 | 1.01.2013-31.12.2013        \\ | 1.01.2014-1.05.2014        \\ |
| Processing time        \\ | 64 \[s\] | 39 \[s\] | 5 \[s\] |
*Table 2. Statistics for map task*
|| Parameter || Test 1 || Test 2 || Test 3 ||
| Processing time (for all records)     \\ | 64 \[s\] | 39 \[s\]     \\ | 5 \[s\]     \\ |
| Number of records     \\ | 165 518   \\ | 146 608   \\ | 9 287   \\ |
*Table 3. Statistics for reduce task*
|| Parameter || Test 1 || Test 2 || Test 3 ||
| Processing time (for all records)      \\ | 0,031 \[s\] | 0,022 \[s\]      \\ | 0,003 \[s\]      \\ |
| Number of records      \\ | 894 | 640 | 83 |


h2. Technical details

*Workflow*
# The experiment is composed of the following steps (accordingly to the MapReduce schema) \[[casecounter.sh|https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/jobs-scripts/casecounter.sh]\]:
## the map task,
### for each tuple in visits table:
#### if icd10 code is in the set of given icd10 codes and if visit belong to the given period, then find the subperiod Id and add into the context pair: Key=subperiod_id, Value=visit_id 
## the reduce task \[[casecounter.sh|https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/jobs-scripts/casecounter.sh]\]:
### for each subperiod_id aggregate all visits ids in hash set - in order to find out the number of different visits
### produce pair Key=superiod_id, Value=number of different visits (size of the hash set)
# statistics are gathered by downloading and parsing log files \[[test.sh|https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/test.sh]\]

*Scripts used to execute evaluation*


[https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/jobs-scripts/casecounter.sh]


[https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/jobs-scripts/test.sh]

*Execution commands*
{noformat}
./casecounter.sh -admission 20130101 -destination ./test4/casescounter.png -discharge 20131231 -hospital wcpit -icd10 J85.1 -periods 4 -width 800 -height 600
./test.sh casesCounter

where:
-admission : date of patient admission to hospital
-discharge : date of patient discharge from hospital
-destination : folder for hadoop job results (only one per job execution)
-icd10 : idc10 code-width : width of the chart in pixels
-height : height of the chart in pixels
-periods : number of subperiods
{noformat}{*}Important note:* please change the _\-destination_ for each job execution.

*Hadoop job*

[https://git.man.poznan.pl/stash/projects/SCAP/repos/mr-jobs/browse/epidemic-jobs/casescounter]