An example of the application of the mathematical model to the assessment of gene selection methods

We report the results obtained from the application of SVM-RFE [8] and Golub's method [7] to $ 100$ artificial datasets generated through the procedure described in A procedure to compare the effectiveness of gene selection methods, by using input parameters satisfying the Diggle test for the Colon-cancer dataset (see Parameters of synthetic data).

Since the table containing the percentages of overlapping between the set of relevant genes and the sets of genes selected by the two methods is too big, we visualize the distribution of the ratio of the correctly predicted genes through two boxplots (Fig. 6). As reported in the paper, to compare the two methods we simply count how many times Golub's method achieves a better performance with respect to SVM-RFE. The Golub's algorithm outperforms SVM-RFE on $ 57/100$ instances of synthetic data. The mean of the percentage of correctly predicted genes computed across the $ 100$ gene selection experiments are respectively $ 81$ % for Golub's method and $ 71$ % for SVM-RFE, showing that with these data a simple univariate method works better than the more complex RFE algorithm.

Figure 6: Artificial colon-like data. Boxplot of the ratios of correctly selected genes with the Golub's and SVM-RFE methods.

\includegraphics[width=9cm]{boxplot.eps}