Supplementary information for the paper:


HykGene: A Hybrid Approach for Selecting Marker Genes for Phenotype Classification using Microarray Gene Expression Data

Yuhang Wang, Fillia Makedon, James Ford and Justin Pearlman.


1. Software available for download.

2. Data used in our paper for download.

3. Supplementary tables.

Table 2. Results on ALL/ALL data set using Relief-F
  Relief-F
50 top-ranked genes 100 top-ranked genes
All 50 SOM HykGene All 100 SOM HykGene
acc. acc. genes acc. genes p-value acc. acc. genes acc. genes p-value
k-NN 98.61 94.44 8 100 7 0.014 98.61 93.06 7 100 6 0.014
SVM 98.61 94.44 7 100 10 0.025 98.61 95.83 3 98.61 12 0.025
C4.5 79.17 93.06 7 94.44 1 0.024 79.17 88.89 6 94.44 1 0.024
NB 97.22 94.44 8 98.61 4 0.027 98.61 95.83 10 100 5 0.014
Table 3. Results on ALL/ALL data set using Information Gain
  Information Gain
50 top-ranked genes 100 top-ranked genes
All 50 SOM HykGene All 100 SOM HykGene
acc. acc. genes acc. genes p-value acc. acc. genes acc. genes p-value
k-NN 95.83 94.44 8 98.61 25 0.027 97.22 97.22 11 100 13 0.007
SVM 97.22 94.44 8 97.22 23 0.025 95.83 95.83 3 100 10 0.007
C4.5 83.33 94.44 8 94.44 1 0.021 80.56 93.06 4 94.44 1 0.008
NB 95.83 93.06 7 100 33 0.056 95.83 97.22 9 100 13 0.041
Table 4. Results on ALL/ALL data set using Chi Squared statistic
  Chi Squared statistic
50 top-ranked genes 100 top-ranked genes
All 50 SOM HykGene All 100 SOM HykGene
acc. acc. genes acc. genes p-value acc. acc. genes acc. genes p-value
k-NN 95.83 95.83 11 98.61 18 0.088 95.83 94.44 5 98.61 5 0.008
SVM 97.22 94.44 10 97.22 4 0.007 97.22 95.83 14 98.61 5 0.003
C4.5 81.94 88.89 11 94.44 1 0.024 83.33 94.44 3 94.44 1 0.012
NB 95.83 95.83 4 98.61 8 0.012 95.83 97.22 11 98.61 4 0.022
Table 5. Results on Colon Tumor data set using Relief-F
  Relief-F
50 top-ranked genes 100 top-ranked genes
All 50 SOM HykGene All 100 SOM HykGene
acc. acc. genes acc. genes p-value acc. acc. genes acc. genes p-value
k-NN 85.48 79.03 5 88.71 10 0.068 83.87 77.42 7 90.32 5 0.004
SVM 87.10 85.48 9 88.71 17 0.004 88.71 79.03 9 90.32 40 0.003
C4.5 82.26 80.65 11 87.10 32 0.000 82.26 72.58 6 87.10 38 0.000
NB 85.48 82.3 6 87.10 6 0.020 83.87 77.42 9 90.32 22 0.001
Table 6. Results on Colon Tumor data set using Information Gain
  Information Gain
50 top-ranked genes 100 top-ranked genes
All 50 SOM HykGene All 100 SOM HykGene
acc. acc. genes acc. genes p-value acc. acc. genes acc. genes p-value
k-NN 87.10 85.48 6 90.32 28 0.018 83.87 85.48 8 88.71 54 0.025
SVM 87.10 80.65 5 90.32 17 0.014 87.10 82.26 14 88.71 47 0.001
C4.5 83.87 91.94 8 91.94 3 0.000 85.48 90.32 8 91.94 67 0.000
NB 85.48 74.19 7 87.1 12 0.023 79.03 70.97 5 85.48 13 0.007
Table 7. Results on Colon Tumor data set using Chi Squared statistic
  Chi Squared statistic
50 top-ranked genes 100 top-ranked genes
All 50 SOM HykGene All 100 SOM HykGene
acc. acc. genes acc. genes p-value acc. acc. genes acc. genes p-value
k-NN 80.64 79.03 10 88.71 16 0.091 83.87 75.81 7 90.32 44 0.034
SVM 85.48 85.48 8 87.10 17 0.005 85.48 85.48 8 88.71 13 0.093
C4.5 83.87 67.74 7 87.10 9 0.017 90.32 87.10 8 90.32 16 0.009
NB 83.87 67.74 4 85.48 16 0.073 80.64 77.42 8 87.10 17 0.009
Table 8. Results on MLL data set using Relief-F
  Relief-F
50 top-ranked genes 100 top-ranked genes
All 50 SOM HykGene All 100 SOM HykGene
acc. acc. genes acc. genes p-value acc. acc. genes acc. genes p-value
k-NN 95.83 90.28 7 97.22 36 0.062 95.83 72.22 3 98.61 15 0.013
SVM 97.22 91.67 7 97.22 30 0.019 98.61 70.83 3 100 39 0.054
C4.5 93.06 86.11 9 95.83 20 0.094 91.67 73.61 10 94.44 25 0.057
NB 97.22 91.67 8 97.22 16 0.261 97.22 73.61 3 98.61 38 0.061
Table 9. Results on MLL data set using Information Gain
  Information Gain
50 top-ranked genes 100 top-ranked genes
All 50 SOM HykGene All 100 SOM HykGene
acc. acc. genes acc. genes p-value acc. acc. genes acc. genes p-value
k-NN 90.28 83.33 4 97.22 22 0.019 90.27 87.50 5 97.22 14 0.014
SVM 95.83 86.11 3 98.61 22 0.015 97.22 86.11 5 97.22 14 0.080
C4.5 94.44 84.72 10 94.44 31 0.083 91.67 83.33 4 91.67 15 0.121
NB 94.44 86.11 7 95.80 20 0.174 93.06 86.11 5 97.22 11 0.019
Table 10. Results on MLL data set using Chi Squared statistic
  Chi Squared statistic
50 top-ranked genes 100 top-ranked genes
All 50 SOM HykGene All 100 SOM HykGene
acc. acc. genes acc. genes p-value acc. acc. genes acc. genes p-value
k-NN 95.83 81.94 6 97.22 38 0.086 94.44 87.50 4 100 26 0.012
SVM 95.83 86.11 4 98.61 25 0.276 97.22 91.67 6 98.61 11 0.030
C4.5 86.11 79.17 4 91.67 8 0.005 91.67 86.11 7 91.67 14 0.040
NB 94.44 83.33 5 95.83 18 0.242 94.44 88.89 11 97.22 15 0.357