|
Rare Event Detection
Data mining is used to detect anomalies or
rare events. An anomaly can be used
as an indication of a possible dangerous situation in computer networks and
other systems. Unsupervised techniques and supervised techniques are the
two dominant types of rare event mining. Supervised techniques are
classification-based. This type of approach consists of a machine-learning
algorithm being trained over pre-classified data. With data labeled as
normal or rare classes, supervised algorithms aim at achieving high recall
or precision or both. A nature of rare class problems is the imbalance of
data in normal classes and rare classes. A common solution is to use
different sampling schemes to alter data distribution so that the data
imbalance is alleviated in training data.
Extensible Markov Models are well suited to model the
spatiotemporal environment and to detect rare events. They support both supervised and
unsupervised detection. Even though
the basic EMM algorithms assume that learning is used to identify rare
events in an unsupervised manner, pre-designated nodes can be added to the
EMM which are known to be target events to look for. Scalability is achieved due to the fact
that similar real world events are clustered into one EMM node. In addition, nodes can be removed from
the EMM or nodes may be merged together if desired. EMMs can be used to identify events that
are rare based on the events themselves (space), time of the events
(temporal), or unusual transitions.
Finally, the EMM rare event detection algorithm works dynamically in
a quasi-real time manner as the data arrives. The time required for each execution of
the rare event detection algorithm is dominated by the clustering algorithm
which in turn depends on the number of EMM states (not the number of real
world events).
EMMs
predict (detect) rare events when a captured real world event is not close
enough to an existing node in the graph or when the transition probability
to the closest node from the previous node is low. Thus the current captured event has not
occurred frequently in the past or has not occurred following the previous
state often. We have examined the
use of EMMs for rare event detection using network VoIP traffic data as
well as automobile traffic data. In
these large environments, EMM achieves scalability through a distributed
hierarchical approach. Anomalies in
Web traffic can be examined in a hierarchically distributed fashion.
Risk Level Assessment:
Traffic anomaly is
an important risk indication in computer networks. However, anomaly
detection techniques using positive security methods suffer from a high
false alarm rate when a high detection rate is pursued. Through the use of
a heuristic risk assessment model we are able to reduce the false alarm
rate. Operations proposed are solely based on the synopsis of the data
stream profile characterized by the EMM. The experiments conducted with
VoIP CDR (Call Detail Records) data provided by Cisco Systems show that
compared with a positive security-based anomaly detection model, the false
alarm rate caused by the proposed model is significantly mitigated without
losing a high detection rate.
|