|
STREAM ABSTRACTION
What is needed is a higher level approach to processing
sensor data. This approach provides
levels of abstraction in processing the sensor data and actually
incorporates functionality at all levels of the sensor data management
– from the low level generation of the raw data to the high level
presentation of the data to the domain expert who needs to make decisions
and recommendations concerning the data.
There is simply too much data to assume that individuals at this
level will be continually making queries to the sensor/stream data. Instead, in our model we envision that
push based applications are used to send data to the domain expert. This pushed data provides the information
needed to make decisions. The figure
below illustrates our hierarchical view of how sensor stream data should be
visualized. In our research we
concentrate on the top two levels.
As seen in
the Figure, our hierarchical stream data model has four levels:
Level 0 - Physical
Level: This is level where
the raw data is generated. We assume
that sensors are used to obtain the data.
The sensors are placed at many sites and they may move. The raw data
at this level may or may not be actually stored.
Level 1 –
DSMS: At this level the
sensor data is merged, aggregated, and cleansed. We assume that a DSMS is responsible for
receiving the data from the many sensors .In addition, normal DSMS queries
may be processed against this data.
Level 2 – Model: A model at this level actually summarizes
the data streams processed at level 1. This summarization is not like that
used by aggregation in data warehouses, but rather a high level view of
what the stream data has looked like and currently looks like. The model created is for all of the
streams processed at Level 1. The
model captures not only the data obtained by the sensors, but also the
spatial aspect of the data (where the sensor is located) and the temporal
aspect. It is important to note that
the temporal aspect is not only the timestamp of the data, but also the
ordering of the data from the sensors.
In our overview we assume a dynamic first order Markov chain is
used. This dynamic model not only
summarizes the data, but also captures concept drifts. Machine learning techniques allow the
learning and forgetting of data over time.
Clustering techniques are used to ensure a sub-linear growth rate of
this model.
Level 3 – Domain Expert: We call this level domain expert as we
assume the primary users at this level are domain experts who need to
examine the sensor data at an extremely high level. They rely upon the output of data mining
applications applied to the model data.
They will examine visual summaries of the data, output of anomaly
detection software, and other data mining output. The exact view that each domain has
depends completely on his needs. No real data exists at this level. The external view is only visual.
|