Supplementary information for the paper:
HykGene: A Hybrid Approach for Selecting Marker Genes for Phenotype
Classification using Microarray Gene Expression Data
Yuhang Wang, Fillia Makedon, James Ford and Justin Pearlman.
1. Software available for download.
2. Data used in our paper for download.
3. Supplementary tables.
| Table 2. Results on ALL/ALL data set using Relief-F | ||||||||||||||
| Relief-F | ||||||||||||||
| 50 top-ranked genes | 100 top-ranked genes | |||||||||||||
| All 50 | SOM | HykGene | All 100 | SOM | HykGene | |||||||||
| acc. | acc. | genes | acc. | genes | p-value | acc. | acc. | genes | acc. | genes | p-value | |||
| k-NN | 98.61 | 94.44 | 8 | 100 | 7 | 0.014 | 98.61 | 93.06 | 7 | 100 | 6 | 0.014 | ||
| SVM | 98.61 | 94.44 | 7 | 100 | 10 | 0.025 | 98.61 | 95.83 | 3 | 98.61 | 12 | 0.025 | ||
| C4.5 | 79.17 | 93.06 | 7 | 94.44 | 1 | 0.024 | 79.17 | 88.89 | 6 | 94.44 | 1 | 0.024 | ||
| NB | 97.22 | 94.44 | 8 | 98.61 | 4 | 0.027 | 98.61 | 95.83 | 10 | 100 | 5 | 0.014 | ||
| Table 3. Results on ALL/ALL data set using Information Gain | ||||||||||||||
| Information Gain | ||||||||||||||
| 50 top-ranked genes | 100 top-ranked genes | |||||||||||||
| All 50 | SOM | HykGene | All 100 | SOM | HykGene | |||||||||
| acc. | acc. | genes | acc. | genes | p-value | acc. | acc. | genes | acc. | genes | p-value | |||
| k-NN | 95.83 | 94.44 | 8 | 98.61 | 25 | 0.027 | 97.22 | 97.22 | 11 | 100 | 13 | 0.007 | ||
| SVM | 97.22 | 94.44 | 8 | 97.22 | 23 | 0.025 | 95.83 | 95.83 | 3 | 100 | 10 | 0.007 | ||
| C4.5 | 83.33 | 94.44 | 8 | 94.44 | 1 | 0.021 | 80.56 | 93.06 | 4 | 94.44 | 1 | 0.008 | ||
| NB | 95.83 | 93.06 | 7 | 100 | 33 | 0.056 | 95.83 | 97.22 | 9 | 100 | 13 | 0.041 | ||
| Table 4. Results on ALL/ALL data set using Chi Squared statistic | ||||||||||||||
| Chi Squared statistic | ||||||||||||||
| 50 top-ranked genes | 100 top-ranked genes | |||||||||||||
| All 50 | SOM | HykGene | All 100 | SOM | HykGene | |||||||||
| acc. | acc. | genes | acc. | genes | p-value | acc. | acc. | genes | acc. | genes | p-value | |||
| k-NN | 95.83 | 95.83 | 11 | 98.61 | 18 | 0.088 | 95.83 | 94.44 | 5 | 98.61 | 5 | 0.008 | ||
| SVM | 97.22 | 94.44 | 10 | 97.22 | 4 | 0.007 | 97.22 | 95.83 | 14 | 98.61 | 5 | 0.003 | ||
| C4.5 | 81.94 | 88.89 | 11 | 94.44 | 1 | 0.024 | 83.33 | 94.44 | 3 | 94.44 | 1 | 0.012 | ||
| NB | 95.83 | 95.83 | 4 | 98.61 | 8 | 0.012 | 95.83 | 97.22 | 11 | 98.61 | 4 | 0.022 | ||
| Table 5. Results on Colon Tumor data set using Relief-F | ||||||||||||||
| Relief-F | ||||||||||||||
| 50 top-ranked genes | 100 top-ranked genes | |||||||||||||
| All 50 | SOM | HykGene | All 100 | SOM | HykGene | |||||||||
| acc. | acc. | genes | acc. | genes | p-value | acc. | acc. | genes | acc. | genes | p-value | |||
| k-NN | 85.48 | 79.03 | 5 | 88.71 | 10 | 0.068 | 83.87 | 77.42 | 7 | 90.32 | 5 | 0.004 | ||
| SVM | 87.10 | 85.48 | 9 | 88.71 | 17 | 0.004 | 88.71 | 79.03 | 9 | 90.32 | 40 | 0.003 | ||
| C4.5 | 82.26 | 80.65 | 11 | 87.10 | 32 | 0.000 | 82.26 | 72.58 | 6 | 87.10 | 38 | 0.000 | ||
| NB | 85.48 | 82.3 | 6 | 87.10 | 6 | 0.020 | 83.87 | 77.42 | 9 | 90.32 | 22 | 0.001 | ||
| Table 6. Results on Colon Tumor data set using Information Gain | ||||||||||||||
| Information Gain | ||||||||||||||
| 50 top-ranked genes | 100 top-ranked genes | |||||||||||||
| All 50 | SOM | HykGene | All 100 | SOM | HykGene | |||||||||
| acc. | acc. | genes | acc. | genes | p-value | acc. | acc. | genes | acc. | genes | p-value | |||
| k-NN | 87.10 | 85.48 | 6 | 90.32 | 28 | 0.018 | 83.87 | 85.48 | 8 | 88.71 | 54 | 0.025 | ||
| SVM | 87.10 | 80.65 | 5 | 90.32 | 17 | 0.014 | 87.10 | 82.26 | 14 | 88.71 | 47 | 0.001 | ||
| C4.5 | 83.87 | 91.94 | 8 | 91.94 | 3 | 0.000 | 85.48 | 90.32 | 8 | 91.94 | 67 | 0.000 | ||
| NB | 85.48 | 74.19 | 7 | 87.1 | 12 | 0.023 | 79.03 | 70.97 | 5 | 85.48 | 13 | 0.007 | ||
| Table 7. Results on Colon Tumor data set using Chi Squared statistic | ||||||||||||||
| Chi Squared statistic | ||||||||||||||
| 50 top-ranked genes | 100 top-ranked genes | |||||||||||||
| All 50 | SOM | HykGene | All 100 | SOM | HykGene | |||||||||
| acc. | acc. | genes | acc. | genes | p-value | acc. | acc. | genes | acc. | genes | p-value | |||
| k-NN | 80.64 | 79.03 | 10 | 88.71 | 16 | 0.091 | 83.87 | 75.81 | 7 | 90.32 | 44 | 0.034 | ||
| SVM | 85.48 | 85.48 | 8 | 87.10 | 17 | 0.005 | 85.48 | 85.48 | 8 | 88.71 | 13 | 0.093 | ||
| C4.5 | 83.87 | 67.74 | 7 | 87.10 | 9 | 0.017 | 90.32 | 87.10 | 8 | 90.32 | 16 | 0.009 | ||
| NB | 83.87 | 67.74 | 4 | 85.48 | 16 | 0.073 | 80.64 | 77.42 | 8 | 87.10 | 17 | 0.009 | ||
| Table 8. Results on MLL data set using Relief-F | ||||||||||||||
| Relief-F | ||||||||||||||
| 50 top-ranked genes | 100 top-ranked genes | |||||||||||||
| All 50 | SOM | HykGene | All 100 | SOM | HykGene | |||||||||
| acc. | acc. | genes | acc. | genes | p-value | acc. | acc. | genes | acc. | genes | p-value | |||
| k-NN | 95.83 | 90.28 | 7 | 97.22 | 36 | 0.062 | 95.83 | 72.22 | 3 | 98.61 | 15 | 0.013 | ||
| SVM | 97.22 | 91.67 | 7 | 97.22 | 30 | 0.019 | 98.61 | 70.83 | 3 | 100 | 39 | 0.054 | ||
| C4.5 | 93.06 | 86.11 | 9 | 95.83 | 20 | 0.094 | 91.67 | 73.61 | 10 | 94.44 | 25 | 0.057 | ||
| NB | 97.22 | 91.67 | 8 | 97.22 | 16 | 0.261 | 97.22 | 73.61 | 3 | 98.61 | 38 | 0.061 | ||
| Table 9. Results on MLL data set using Information Gain | ||||||||||||||
| Information Gain | ||||||||||||||
| 50 top-ranked genes | 100 top-ranked genes | |||||||||||||
| All 50 | SOM | HykGene | All 100 | SOM | HykGene | |||||||||
| acc. | acc. | genes | acc. | genes | p-value | acc. | acc. | genes | acc. | genes | p-value | |||
| k-NN | 90.28 | 83.33 | 4 | 97.22 | 22 | 0.019 | 90.27 | 87.50 | 5 | 97.22 | 14 | 0.014 | ||
| SVM | 95.83 | 86.11 | 3 | 98.61 | 22 | 0.015 | 97.22 | 86.11 | 5 | 97.22 | 14 | 0.080 | ||
| C4.5 | 94.44 | 84.72 | 10 | 94.44 | 31 | 0.083 | 91.67 | 83.33 | 4 | 91.67 | 15 | 0.121 | ||
| NB | 94.44 | 86.11 | 7 | 95.80 | 20 | 0.174 | 93.06 | 86.11 | 5 | 97.22 | 11 | 0.019 | ||
| Table 10. Results on MLL data set using Chi Squared statistic | ||||||||||||||
| Chi Squared statistic | ||||||||||||||
| 50 top-ranked genes | 100 top-ranked genes | |||||||||||||
| All 50 | SOM | HykGene | All 100 | SOM | HykGene | |||||||||
| acc. | acc. | genes | acc. | genes | p-value | acc. | acc. | genes | acc. | genes | p-value | |||
| k-NN | 95.83 | 81.94 | 6 | 97.22 | 38 | 0.086 | 94.44 | 87.50 | 4 | 100 | 26 | 0.012 | ||
| SVM | 95.83 | 86.11 | 4 | 98.61 | 25 | 0.276 | 97.22 | 91.67 | 6 | 98.61 | 11 | 0.030 | ||
| C4.5 | 86.11 | 79.17 | 4 | 91.67 | 8 | 0.005 | 91.67 | 86.11 | 7 | 91.67 | 14 | 0.040 | ||
| NB | 94.44 | 83.33 | 5 | 95.83 | 18 | 0.242 | 94.44 | 88.89 | 11 | 97.22 | 15 | 0.357 | ||