Skip to end of metadata
Go to start of metadata

Evaluator(s)

Tomasz Hofmann (PSNC)

Evaluation points

The goal of this evaluation was to execute analysis on the Medical Data Center dataset and compute statistics on the gender of patients treated in a given period. The period of time to analysis is given as the input parameter for the analysis algorithm. As the metric the number of objects per second (number of records processed per second) has been selected. 

Assessment of measurable points
Metric Description Metric baseline Metric goal July 21, 2014 [Test 1] July 22, 2014 [Test 2] July 31, 2014 [Test 3]
number of objects per second number of records processed per second
- - 1798 [obj/s] 1600 [obj/s] 984 [obj/s]

Note: Metrics must be registered in the metrics catalogue

Visualisation of results

The chart below presents results of the analysis for Test 2. Colours indicate gender of the patients. Each colour on the pie chart has related entry (note). Each entry is composed as follows: Y = Z [P], where Y is the name of gender, Z is the number of patient's visits (indicates the number of visits for analysed time period) and P is the percentage of the patient's visit in the overall context.

     

Table 1 presents processing time of the whole job per test. Tables 2 and 3 provide information on the execution time  and number of processed rows related to map and reduce tasks respectively. From the statistics in the table and measurable points it is visible that: a) the processing time depends on the number of records to be processed b) the more records to process the better performance is achieved (more rows per second are processed).

Table 1. Overall statistics

Parameter Test 1 Test 2 Test 3
Analyzed period    
1.07.2012-01.07.2014 1.01.2013-31.12.2013
1.01.2014-1.05.2014
Processing time     
95 [s] 94[s] 10[s]

Table 2. Statistics for map task

Parameter Test 1 Test 2 Test 3
Processing time (for all records)  
93 [s] 92 [s]
9,6 [s]
Number of records  
168 021 148 333 9 462

Table 3. Statistics for reduce task

Parameter Test 1 Test 2 Test 3
Processing time (for all records)   
2,17 [s] 2,01 [s]
0,46 [s]
Number of records   
15 331 14 164 895

Technical details

Workflow

The experiment is composed of the following steps (accordingly to MapReduce algorithm schema) [gender.sh]:

  1. the map task [gender.sh]:
    1. for each tuple in visits table:
      1. if visit belong to the given period, then check patients sex and add into the context pair: Key=sex_id, Value=visit_id 
  2. the reduce task [gender.sh]:
    1. for each value of sex_id aggregate all visits ids in hash set - in order to find out the number of different visits
    2. produce pair Key=sex_id, Value=number of different visits (size of the hash set)
  3. statistics are gathered by downloading and parsing log files [test.sh]

Scripts used to execute evaluation 

https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/jobs-scripts/gender.sh

https://git.man.poznan.pl/stash/projects/SCAP/repos/test-scripts/browse/epidemic-jobs-tests/test.sh

Execution commands

./gender.sh -hospital wcpit -destination ./test5/gender.png -admission 20140101 -discharge 20140701 -width 800 -height 600
./test.sh gender

where:
-admission : date of patient admission to hospital
-discharge : date of patient discharge from hospital
-destination : folder for hadoop job results (only one per job execution)
-width : width of the chart in pixels
-height : height of the chart in pixels

Important note: please change the -destination for each job execution.

Hadoop job

https://git.man.poznan.pl/stash/projects/SCAP/repos/mr-jobs/browse/epidemic-jobs/gender

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.