|
Geospatial Data
This
project develops data mining algorithms, both association rules and
prediction, targeted to geospatial data. This GOALI (Grant Opportunities
for Academic Liaison with Industry) research is performed in collaboration
with the SIVAM project underway at Raytheon Systems Company.
Problem:
The
Amazon river basin, which has nearly one third of the entire area of the
world's tropical rain forests, is essential for the climate and biological
diversity of the planet Earth. Based on the awareness that the resources
available today are insufficient for the Brazilians to collect data and
generate useful knowledge on the region's potentialities, limitations, and
realities, the Brazilian Government decided to create the System for the
Vigilance of the Amazon (SIVAM).
SIVAM:
SIVAM
will be composed of a large quantity of sensors and remote user stations
connected to regional coordination centers by a vast and encompassing
telecommunications network. SIVAM's objective is to implement a
surveillance and analysis infrastructure, including a very large (multi
terabyte) geospatial database and associated visualization tools that will
provide the Brazilian Government with the necessary information for the
protection and sustainable development of the Amazon region.
Raytheon:
Raytheon
Systems Company, Garland Division, has the responsibility for the
hardware and software development of SIVAM through a grant from the World
Bank and the government of Brazil.
SMU
Database Group:
Our
research focuses on a very narrow piece of the overall problem: development
of new algorithms for a specific target application. The developed
algorithms scale to the massive amounts of data present as well adapt to
the available amount of main memory. In the SIVAM project, data is obtained
in an ongoing basis (as time advances). The prediction
algorithms are used to predict environmental catastrophies (such as
flooding or deforestation) and are incremental in nature. State information
is kept which "remembers" previous environmental data collected.
As new data arrives, the state is advanced based on the data found. In
addition, these structures used to save this state information are modified
as learning takes place.
The
objective of this research is to develop classification and association rule
algorithms targeted to the SIVAM applications. As the SIVAM
requirements are not unique to the Amazon area alone, the applicability of
the generated tools will be to other geospatial databases as well.
Our
initial work during Fall 1999 focused on performing an extensive survey of
association rule work and simultaneously developing algorithms targeted at
more effective use of main memory. We observe that the distribution of the
candidate number at each iteration is similar to a normal distribution.
That is, more candidates exist in the middle iterations and fewer at the
beginning and ending iterations. For Apriori, there is a database scan at
each iteration regardless of the number of candidates. Therefore the main
memory usage is very low at the beginning and ending iterations, while
there may be insufficient memory in the middle iterations. We developed
strategies to even the number of candidates at each iteration so that the
candidates at the middle iterations are scattered to the beginning and
ending iterations. Recently we have begun investigation of techniques to
perform clustering interactively and dynamically. These will in turn help
to examine similar classification approaches.
We are
currently concentrating on improving these clustering/classification
algorithms and investigating spatial-temporal association rules which
contain spatial and/or temporal predicates. These rules will be the basis
for our investigation of effective classification and prediction techniques
applicable to SIVAM. We plan to use the association rules to perform the
classification. We are also beginning to survey previous prediction
techniques devoted to SIVAM related issues such as flooding and
deforestation.
This
material is based on work supported by the National Science
Foundation under Grant No. 9820841.Any opinions, findings, and conclusions
or recommendations expressed in this material are those of the author and
do not necessarily reflect the views of the National Science Foundation.
|