HykGene is a mixture of Matlab,
Java and Perl code.
Please make sure your computer system meets the following requirements.
System requirements:
PC with at least 512MB RAM.
Matlab 6.0 or later with Statistical Toolbox.
Java 1.4 is installed.
Follow these steps to install Hykgene:
Download the hykgene.zip to your computer. Unzip
it to a directory. For example, C:\tools\hykgene.
Add the HykGene directory to the Matlab path. If you don't know how to do
this, refer to Matlab's user manual.
Add the HykGene directory to the Java CLASSPATH environment variable.
Download and install PRTools 3.1.7 for Matlab. You can get it from here.
Add the PRTools directory to the Matlab path. The latest
version may also work, but I haven't checked.
If you would like to use the SOM clustering option, download and install
the SOM Toolbox for Matlab.
You can also get it here. Add the SOM toolbox
directory to the Matlab path.
Download and install Weka.
Add the weka.jar to the Java CLASSPATH environment variable.
That's it!
How to run HykGene
Input files: HykGene needs two files as input: a gene expression
file in the ARFF
format and a gene information file in tab delimited text file format. Here are
examples of the ARFF file and the gene information file: AMLALL.arff
and AMLALLgeneinfo.txt. In
the gene information file, the first column lists the feature names as used
in the ARFF file; the second column lists the corresponding probe IDs; and the
third column lists the corresponding gene descriptions. You can prepare these
files using Microsoft Excel.
Typical usage:
hykgene(arfffile, geneinfofile, m, rankmethod, classifier, cvfold,
clustermethod, selectedgenes)
where
arfffile: the input gene expression file in ARFF format;
geneinfofile: a tab delimited text file containing gene information;
m: initially select top-m genes;
rankmethod: name for a gene ranking method. Valid options are: 'chi2'(Chi
Squared), 'rf'(Relief-F), 'ig'(Information Gain);
classifier: name for the classification method to use. Valid options are:
'knn'(K-Nearst Neigbor), 'svm'(linear Support Vector Machine), 'c45'(C4.5
classification tree), 'nb'(Naive Bayes);
selectedgane: name of file to store selected marker genes;
For example, you can type in Matlab:
hykgene('AMLALL', 'AMLALLgeneinfo.txt', 100, 'chi2', 'svm', 72, 'hc', 'amlallhkgenes.xls')
As a result, you will get the "amlallhkgenes.xls" file containing
marker genes picked by HykGene and a "AMLALLchi2top50genes.xls" file
containing all of the top-50 genes as ranked by the Chi Squared gene ranking
method.
Frequently Asked Questions
I use R instead of Matlab. Will
you release a R version?
Yes, we are also moving to R. A pure Java version is also planned.
I have some data in .xls files. How can I convert it to ARFF?
It depends on how the data is organized in the .xls files. The easiest way
we have found is to convert it to the CSV
format first, and then use the following from the command line: