Reporting the results on ALL-IDB
A system capable to identify the presence of blast cells in the input image can work with different structures of modules, for example, it can processes the following steps: (i) the identification of white cells in the image, (ii) the selection of Lymphocytes, (iii) the classification of tumor cell. Each single step typically contains segmentation/ classification algorithms. In order to measure and fairly compare the identification accuracy of different structures of modules, we propose a benchmark approach partitioned in three different tests, as follows:
- Cell test - the benchmark account for the classification of single cells is blast or not (the test is positive if the considered cell is blast cell or not);
- Image level - the whole image is classified (the test is positive if the considered image contains at least one blast cell or not).
For each level of the benchmark, it can be processed the confusion matrix of each single test where the term elements refers to the cells/images of the corresponding level:
- True positives (TP) - the number of elements correctly classified as positive by the test;
- True negatives (TN) - the number of elements correctly classified as negative by the test;
- False positive (FP) - also known as type I error, is the number of elements classified as positive by the test, but they are not;
- True positive (FN) - also known as type II error, is the number of elements classified as negative by the test, but they are not.
Using these definitions, it is possible to process the following standard parameters:
Sensitivity (the probability of correctly classifying elements with ALL equals to TP /(TP+FN)),
Specificity (the probability of correctly classifying elements without ALL computable as TN /(TN+FP) and the
Classification error (where the total error in an analysis layer is defined by CE = FP + FN).
Table 1 is an example that can be used to report the performances of an image processing method tested with the two ALL-IDB levels of analysis (cell and image).
Figures of merit | Classification element | |
---|---|---|
Cells | Images | |
TP % | ||
TN % | ||
FP % | ||
FN % | ||
Missclassification % | ||
Sensitivity % | ||
Specificity % |
If the tested method requires the use of calibration/ training data, it is necessary to evaluate the obtained results by using the remaining data of ALL-IDB (e.g., the N-fold validation technique). In case of repeated tests, it is important to report the standard deviation of the obtained classification error and figures of merit.