1. HykGene: A Hybrid Approach for Selecting Marker Genes for Phenotype Classification using Microarray Gene Expression Data

Why? Recent studies have shown that microarray gene expression data is useful for phenotype classification of many diseases. In this classification problem, the number of features (genes) greatly exceeds the number of instances (tissue samples). It has been shown that selecting a small set of informative genes can lead to improved classification accuracy. Many approaches have been proposed for this gene selection problem. Most of the previous gene ranking methods typically select 50-200 top-ranked genes, and these genes are often highly correlated. Our HykGene tool aims to select a small set of non-redundant marker genes that are most relevant for the classification task.

How? To achieve this goal, we developed a novel hybrid approach that combines gene ranking and clustering analysis. In this approach, we first apply feature filtering algorithms to select a set of top-ranked genes, and then apply hierarchical clustering on these genes to generate a dendrogram. Finally, the dendrogram is analyzed by a sweep-line algorithm, and marker genes are selected by collapsing dense clusters.

Does it work? Empirical study using three public data sets shows that our approach is capable of selecting relatively few marker genes while offering the same or even better leave-one-out cross-validation accuracy compared to approaches that use all of the top-ranked genes directly for classification. You can refer to our paper and take a look at the supplementary information site for the paper.

How to get it? HykGene is freely available for academic users. You can get it here.

How to run it? You can refer to the installation guide, the user manual, and the FAQ.

 

2. SWTi-DCNDen: A Stationary Wavelet Denoising Method for Unequally Spaced Array-Based DNA Copy Number Data

Why? High-resolution DNA copy number data can be derived from array-based comparative genome hybridization (array CGH) and the Single-Nucleotide Polymorphism (SNP) arrays. Typically, high-resolution DNA copy number data is very noisy. A few denoising schemes have been previously applied to DNA copy number data. Among these methods, the wavelet denoising method has been shown to have superior performance for this application. However, all of the previous denoising methods for DNA copy number data, including the wavelet-based method, did not consider the physical distances of the probes and assumed uniform spacing of the probes. Simple reasoning shows that denoising methods assuming uniform spacing for unequally spaced DNA copy number data can potentially give incorrect results.

How? To address this issue, we developed a novel stationary wavelet denoising scheme using interpolation (hence the "i" in SWTi) for unequally spaced array-based DNA copy number data. For this denoising scheme, we also extended the covariance model for DWT coefficients obtained from correlated noise for the SWT denoising case.

Does it work? Empirical results on synthetic data generated by a model based on real array-CGH data showed that the proposed SWTi denoising scheme outperformed the previous MODWT-based wavelet denoising method by 4.6% - 12.7% as measured in terms of the overall root mean squared error. Experiments on a public array-CGH data set also confirmed the applicability of our SWTi denoising scheme to real array CGH data. You can refer to the paper and supplementary information for details.

How to get it? Software with both graphical and command-line user interfaces for MS Windows is freely available for academic users. You can download it here (Build 05/24/2007).

How to use it? You can refer to the installation guide, the user manual, and the FAQ.

Extras:

nimblegen2dcn.pl Perl script that converts the normalized data file generated by NimbleGen's array-CGH platform to the .dcn file accepted by DCNDenoise. The input file is [array id]_normalized.txt on NimbleGen's data CD/DVD.
hg17_chr_len.ini Chromosome info file needed by DCNDenoise. Built for hg17 human genome assembly (NCBI Build 35).
hg18_chr_len.ini Chromosome info file needed by DCNDenoise. Built for hg18 human genome assembly (NCBI Build 36).


 
Updated: May 24, 2007