1. HykGene: A Hybrid Approach for
Selecting Marker Genes for Phenotype Classification using Microarray
Gene Expression Data
Why? Recent studies have shown that microarray
gene expression data is useful for phenotype classification of many
diseases. In this classification problem, the number of features
(genes) greatly exceeds the number of instances (tissue samples).
It has been shown that selecting a small set of informative genes
can lead to improved classification accuracy. Many approaches have
been proposed for this gene selection problem. Most of the previous
gene ranking methods typically select 50-200 top-ranked genes, and
these genes are often highly correlated. Our HykGene tool aims to
select a small set of non-redundant marker genes that are most relevant
for the classification task.
How? To achieve this goal, we developed a novel
hybrid approach that combines gene ranking and clustering analysis.
In this approach, we first apply feature filtering algorithms to
select a set of top-ranked genes, and then apply hierarchical clustering
on these genes to generate a dendrogram. Finally, the dendrogram
is analyzed by a sweep-line algorithm, and marker genes are selected
by collapsing dense clusters.
Does it work? Empirical study using three public
data sets shows that our approach is capable of selecting relatively
few marker genes while offering the same or even better leave-one-out
cross-validation accuracy compared to approaches that use all of
the top-ranked genes directly for classification. You can refer
to our paper and take a look at the supplementary
information site for the paper.
How to get it? HykGene is freely available for
academic users. You can get it here.
How to run it? You can refer to the installation
guide, the user manual,
and the FAQ.
2. SWTi-DCNDen: A Stationary Wavelet
Denoising Method for Unequally Spaced Array-Based DNA Copy
Number Data
Why? High-resolution DNA copy number data can be derived from array-based
comparative genome hybridization (array CGH) and the
Single-Nucleotide Polymorphism (SNP) arrays. Typically,
high-resolution DNA copy number data is very noisy. A few denoising
schemes have been previously applied to DNA copy number data. Among
these methods, the wavelet denoising method has been shown to have
superior performance for this application. However, all of the
previous denoising methods for DNA copy number data, including the
wavelet-based method, did not consider the physical distances of the
probes and assumed uniform spacing of the probes. Simple reasoning
shows that denoising methods assuming uniform spacing for unequally
spaced DNA copy number data can potentially give incorrect results.
How? To address this issue, we developed
a novel stationary wavelet denoising scheme using interpolation
(hence the "i" in SWTi) for unequally spaced array-based
DNA copy number data. For this denoising scheme, we also extended
the covariance model for DWT coefficients obtained from correlated
noise for the SWT denoising case.
Does it work? Empirical results on synthetic
data generated by a model based on real array-CGH data showed
that the proposed SWTi denoising scheme outperformed the previous
MODWT-based wavelet denoising method by 4.6% - 12.7% as measured
in terms of the overall root mean squared error. Experiments
on a public array-CGH data set also confirmed the applicability
of our SWTi denoising scheme to real array CGH data. You can
refer to the paper
and supplementary information
for details.
How to get it? Software with both graphical
and command-line user interfaces for MS Windows is freely
available for academic users. You can download it here
(Build 05/24/2007).
How to use it? You can refer to the installation
guide, the user manual,
and the FAQ.
Extras:
| nimblegen2dcn.pl |
Perl script that converts the normalized data file generated
by NimbleGen's
array-CGH platform to the .dcn file accepted by DCNDenoise.
The input file is [array id]_normalized.txt on NimbleGen's
data CD/DVD. |
| hg17_chr_len.ini |
Chromosome info file needed by DCNDenoise. Built for
hg17 human genome assembly (NCBI Build 35). |
| hg18_chr_len.ini |
Chromosome info file needed by DCNDenoise. Built for
hg18 human genome assembly (NCBI Build 36). |
|
|