STREAM ABSTRACTION

What is needed is a higher level approach to processing sensor data.  This approach provides levels of abstraction in processing the sensor data and actually incorporates functionality at all levels of the sensor data management – from the low level generation of the raw data to the high level presentation of the data to the domain expert who needs to make decisions and recommendations concerning the data.  There is simply too much data to assume that individuals at this level will be continually making queries to the sensor/stream data.  Instead, in our model we envision that push based applications are used to send data to the domain expert.  This pushed data provides the information needed to make decisions.  The figure below illustrates our hierarchical view of how sensor stream data should be visualized.  In our research we concentrate on the top two levels.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


As seen in the Figure, our hierarchical stream data model has four levels:

Level 0 - Physical Level:  This is level where the raw data is generated.  We assume that sensors are used to obtain the data.  The sensors are placed at many sites and they may move. The raw data at this level may or may not be actually stored.

Level 1 – DSMS:  At this level the sensor data is merged, aggregated, and cleansed.  We assume that a DSMS is responsible for receiving the data from the many sensors .In addition, normal DSMS queries may be processed against this data.

Level 2 – Model:  A model at this level actually summarizes the data streams processed at level 1. This summarization is not like that used by aggregation in data warehouses, but rather a high level view of what the stream data has looked like and currently looks like.  The model created is for all of the streams processed at Level 1.  The model captures not only the data obtained by the sensors, but also the spatial aspect of the data (where the sensor is located) and the temporal aspect.  It is important to note that the temporal aspect is not only the timestamp of the data, but also the ordering of the data from the sensors.  In our overview we assume a dynamic first order Markov chain is used.  This dynamic model not only summarizes the data, but also captures concept drifts.  Machine learning techniques allow the learning and forgetting of data over time.  Clustering techniques are used to ensure a sub-linear growth rate of this model.

Level 3 – Domain Expert:  We call this level domain expert as we assume the primary users at this level are domain experts who need to examine the sensor data at an extremely high level.  They rely upon the output of data mining applications applied to the model data.  They will examine visual summaries of the data, output of anomaly detection software, and other data mining output.  The exact view that each domain has depends completely on his needs. No real data exists at this level.  The external view is only visual.