Geospatial Data

This project develops data mining algorithms, both association rules and prediction, targeted to geospatial data. This GOALI (Grant Opportunities for Academic Liaison with Industry) research is performed in collaboration with the SIVAM project underway at Raytheon Systems Company.

Problem:

The Amazon river basin, which has nearly one third of the entire area of the world's tropical rain forests, is essential for the climate and biological diversity of the planet Earth. Based on the awareness that the resources available today are insufficient for the Brazilians to collect data and generate useful knowledge on the region's potentialities, limitations, and realities, the Brazilian Government decided to create the System for the Vigilance of the Amazon (SIVAM).

SIVAM:

SIVAM will be composed of a large quantity of sensors and remote user stations connected to regional coordination centers by a vast and encompassing telecommunications network. SIVAM's objective is to implement a surveillance and analysis infrastructure, including a very large (multi terabyte) geospatial database and associated visualization tools that will provide the Brazilian Government with the necessary information for the protection and sustainable development of the Amazon region.

Raytheon:

Raytheon Systems Company, Garland Division, has the responsibility for the hardware and software development of SIVAM through a grant from the World Bank and the government of Brazil.

SMU Database Group:

Our research focuses on a very narrow piece of the overall problem: development of new algorithms for a specific target application. The developed algorithms scale to the massive amounts of data present as well adapt to the available amount of main memory. In the SIVAM project, data is obtained in an ongoing basis (as time advances). The prediction algorithms are used to predict environmental catastrophies (such as flooding or deforestation) and are incremental in nature. State information is kept which "remembers" previous environmental data collected. As new data arrives, the state is advanced based on the data found. In addition, these structures used to save this state information are modified as learning takes place.

The objective of this research is to develop classification and association rule algorithms targeted to the SIVAM applications. As the SIVAM requirements are not unique to the Amazon area alone, the applicability of the generated tools will be to other geospatial databases as well.

Our initial work during Fall 1999 focused on performing an extensive survey of association rule work and simultaneously developing algorithms targeted at more effective use of main memory. We observe that the distribution of the candidate number at each iteration is similar to a normal distribution. That is, more candidates exist in the middle iterations and fewer at the beginning and ending iterations. For Apriori, there is a database scan at each iteration regardless of the number of candidates. Therefore the main memory usage is very low at the beginning and ending iterations, while there may be insufficient memory in the middle iterations. We developed strategies to even the number of candidates at each iteration so that the candidates at the middle iterations are scattered to the beginning and ending iterations. Recently we have begun investigation of techniques to perform clustering interactively and dynamically. These will in turn help to examine similar classification approaches.

We are currently concentrating on improving these clustering/classification algorithms and investigating spatial-temporal association rules which contain spatial and/or temporal predicates. These rules will be the basis for our investigation of effective classification and prediction techniques applicable to SIVAM. We plan to use the association rules to perform the classification. We are also beginning to survey previous prediction techniques devoted to SIVAM related issues such as flooding and deforestation.

This material is based on work supported by the National Science Foundation under Grant No. 9820841.Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.