IDA@SMU Banner

TRACDS: Temporal Relationships Among Clusters for Massive Data Streams

State-of-the-art data stream clustering algorithms developed by the data mining community do not utilize the temporal order of events and therefore in the resulting clustering all temporal information is lost. This is quite strange as one of the salient features of data streams is temporal ordering of events. In this project we develop a technique to efficiently incorporate temporal ordering into the clustering process and prove its usefulness on large, high-throughput data streams. Temporal ordering is introduced into the data stream clustering process by dynamically constructing an evolving Markov Chain where the states represent clusters. Our approach is based on the previously developed Extensible Markov Model (EMM). The results of this project will provide a framework upon which important stream mining applications such as anomaly detection and prediction of future events are easily implemented.

Broader Impact. By showing that state-of-the-art data steam clustering algorithms can incorporate temporal order information efficiently, this project will have a broad impact on many areas where temporal order is essential. As examples, NOAA Hurricane Data and NASA satellite data will be used throughout this project.

Team

Matt Bolanos, Sudheer Chelluboina, Margaret H. Dunham (Co-PI), John Forrest, Michael Hahsler (Co-PI), Vladimir Jovanovic, Hadil Shaiba, Yu Su

Developed Software

Activities

Media

Publications

  1. Anurag Nagar and Michael Hahsler. Using text and data mining techniques to extract stock market sentiment from live news streams. In 2012 International Conference on Computer Technology and Science (ICCTS 2012), August 2012.
  2. Charlie Isaksson, Margaret H. Dunham, and Michael Hahsler. SOStream: Self organizing density-based clustering over data stream. In International Conference on Machine Learning and Data Mining (MLDM'2012). Springer, July 2012.
  3. Vladimir Jovanovic, Margaret H. Dunham, Michael Hahsler, and Yu Su. Evaluating hurricane intensity prediction techniques in real time. In Third IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Proceedings of the of the 2011 IEEE International Conference on Data Mining Workshops (ICDMW 2011). IEEE, 2011.
  4. John Forrest. Stream: A Framework for Data Stream Modeling in R. Bachelor Thesis, Department of Computer Science and Engineering, SMU, 2011.
  5. Michael Hahsler and Margaret H. Dunham. Temporal structure learning for clustering massive data streams in real-time. In SIAM Conference on Data Mining (SDM11). SIAM, 2011.
  6. Yu Su, Sudheer Chelluboina, Michael Hahsler, and Margaret H Dunham, A New Data Mining Model for Hurricane Intensity Prediction, 2nd IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Proceedings of the 2010 IEEE International Conference on Data Mining Workshops (ICDMW 2010). IEEE, 2010
  7. Margaret H. Dunham, Michael Hahsler, and Myra Spiliopoulou. Novel data stream pattern mining, Report on the StreamKDD’10 Workshop. SIGKDD Explorations, 12(2):54-55, 2010.
  8. Michael Hahsler and Margaret H. Dunham, rEMM: Extensible Markov Model for Data Stream Clustering in R, Journal of Statistical Software, 35(5):1-31, 2010.
  9. Margaret H. Dunham, Michael Hahsler, and Myra Spiliopoulou, editors. Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques (StreamKDD'10). ACM Press, New York, NY, USA, 2010

Acknowledgement of Support

NSF This research is supported by the National Science Foundation under Grant No. IIS-0948893.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

IDA Images