SMU Database Research Group Software and Data

Below are links to software and associated data used to perform experiments in our group. We’ve also included some software developed in classes.  Please feel free to take and use what you wish. We would appreciate proper acknowledgements if you do use the data or code in publications. Contact Professor Dunham (mhd@engr.smu.edu) with questions or problems. Have Fun !



EMMCM - EMM plus Meta Classifier Modeling Tool:
The SMU DBGROUP is creating a tool which will include all features of EMM, TCGR, and a metaclassification technique called MCM. As of this date (2/13/09), a beta version is avaiable with the functionality split between different parts of code. Below are links to obtain the Beta versions of the software. More complete versions will be added as they become available.
  • Unix version oF EMM
  • Windows version oF EMM
  • mcm
  • tcgr


  • BDMine (BioDegradataion Mine):
    BDMine is a set of chemical compounds and related data that have been collected from many different sources. Our hypothesis is that a prediction of biodegradation can be made based on properties of the compounds separate from the physical chemical structure. We welcome feedback and users of this data. We also welcome any assistance in collecting and maintaining the accuracy of this data.
  • BDMine User Guide
  • Online Version of BDMine
  • Complete BDMINE software


  • Java API for WordNet Searching (JAWS):
    JAWS is an API that provides Java applications the ability to access Wordnet. JAWS was developed by Brett Spell as a class project in CSE 8337 in Spring 2007.
  • JAWS Download


  • Extensible Markov Model (EMM):
    EMM is a dynamic first order Markov Model that is used to model spatial temporal data obtained from data streams
  • EMM Description
    Matlab Code:
  • EMM Matlab Code Description
  • EMM Code
  • Temporal Chaos Game Representation (TCGR):
    TCGR is a visualization tool designed to view spatial temporal data obtained from data streams. It has its origin in Chaos Game Represenation and thus has been applied to view DNA and RNA data. However, it has been extended to many other types of data as well.
  • TCGR Description
  • TCGR Visualization Programs version 1.16b


  • Flood Prediction Data:
  • Data Description
  • Ouse
  • Derwent


  • Web Usage Mining: OAT Algorithm finds Maximal Frequent Sequences. It uses a suffix tree data structure which is compressed to ensure scalability.