Académique Documents
Professionnel Documents
Culture Documents
RIKARD
LAXHAMMAR
RIKARD
LAXHAMMAR
Masters Thesis in Computer Science (20 credits) at the School of Computer Science and Engineering Royal Institute of Technology year 2007 Supervisor at CSC was rjan Ekeberg Examiner was Anders Lansner TRITA-CSC-E 2007:046 ISRN-KTH/CSC/E--07/046--SE ISSN-1653-5715
Royal Institute of Technology School of Computer Science and Communication KTH CSC SE-100 44 Stockholm, Sweden URL: www.csc.kth.se
Foreword
This document constitutes my masters thesis at the department of Computer Science at the Royal Institute of Technology, Stockholm. The masters project has been commissioned by Saab Systems in Jrflla, Sweden, and has been done within the frame of their R&D efforts within the field of Situation Assessment. My closest colleagues at Saab Systems have been Andreas Lingvall, project manager of the Situation Assessment R&D project, and Johan Edlund, system developer within the Situation Assessment R&D project, and also my formal supervisor at Saab. I appreciate the informal contact and discussions I have had with both Andreas and Johan throughout my master project, especially during the initial part of the project. These discussions have been of great importance for my project, helping me understand the problems and issues related to situation assessment and sea surveillance. I would like to acknowledge Dr. rjan Ekeberg, my supervisor at the Royal Institute of Technology, for his easy approachability and the feedback he has given to me during my work. Finally, I would also like to thank Dr. Anders Holst, at the Swedish Institute of Computer Science, for the visit at his office and our discussions regarding anomaly detection in sea traffic.
Contents
1 Introduction.................................................................................................................... 1 1.1 1.2 1.3 2 2.1 2.2 3 3.1 3.2 3.3 3.4 3.5 3.6 4 4.1 4.2 4.3 5 5.1 5.2 5.3 5.4 6 6.1 6.2 7 7.1 7.2 7.3 7.4 7.5 Background .............................................................................................................. 1 Problem description ................................................................................................. 2 Purpose and goal ...................................................................................................... 2 Situation Awareness................................................................................................. 3 Data Fusion and Situation Assessment .................................................................... 4 Traditional Rule-based Expert Systems ................................................................... 6 Bayesian Networks................................................................................................... 8 Fuzzy Reasoning, Fuzzy Sets & Fuzzy Logic........................................................ 12 Case-based Reasoning............................................................................................ 14 Anomaly Detection based on Machine Learning................................................... 15 Conclusions of theoretical study ............................................................................ 20 Data Description .................................................................................................... 23 The feature models................................................................................................. 24 Cluster models and clustering algorithms .............................................................. 25 Data pre-processing................................................................................................ 30 Software packages.................................................................................................. 31 Training the models................................................................................................ 33 Performing Anomaly Detection ............................................................................. 33 Experimental setup................................................................................................. 35 Results.................................................................................................................... 36 Degree of vigilance of the systems ........................................................................ 56 Anomaly Detection in unlabelled data ................................................................... 56 Anomaly detection in labelled data........................................................................ 57 Anomaly detection in artificial data ....................................................................... 58 The feature models................................................................................................. 58
Theoretical background................................................................................................. 3
Implementation ............................................................................................................ 30
Experimental evaluation.............................................................................................. 35
7.6 7.7 8
Comparing MoG and ART in general.................................................................... 58 Future work and improvements ............................................................................. 59
Conclusion..................................................................................................................... 62
References ............................................................................................................................. 64
Introduction
1 Introduction
1.1 Background
At Saab Systems, research and development of systems for tracking and identification of moving objects has been done since many years. The objects of interest may be airborne, land-borne or vessels at sea. The tracking process presents to the supervisor the current state of the operational picture, including a number of objects and their momentary position, velocity and motion history since they were first detected.
Figure 1.1: An example of an operational picture in the context of sea surveillance, where the surveillance area corresponds to the sea of resund between Sweden and Denmark. The trapezoid formed objects correspond to tracked vessels. To enhance a supervisors understanding of what is significant in the operational picture, beyond the existence of multiple unidentified or identified objects travelling in different directions, research within the field of situation assessment has been pursued. The purpose of situation assessment is to increase a supervisors awareness of the current operational picture by finding and identifying relations between objects and their environment. Fundamental to the analysis is an ontology which serves as a model of a specific domain (or subset) of the world. The ontology describes relevant concepts in the domain such as objects, events, rules, relations and situations that are of interest for the supervisor. Interesting relations and situations that the ontology describes can be extracted from the operational picture, effectively enhancing the situation awareness of the supervisor. A prototype rule-based expert system has been developed at Saab Systems for performing situation assessment within the domain of sea surveillance. The system is able to identify a number of basic kinematical relations between objects and then deduce different situations of interest in a simulated real time sea surveillance environment. The domain knowledge required for creating adequate models and rules for identifying situations of interest, like smuggling, piloting, hijacking etc., has been acquired by consulting domain experts within the Swedish Naval Intelligence Battalion.
Introduction
Theoretical background
2 Theoretical background
This chapter describes the concepts of Situation Awareness, Data Fusion and Situation Assessment and how these concepts relate to each other.
Feedback
Situation Awareness Level 1 Perception of Elements in Current Situation Level 2 Comprehension of Current Situation Level 3 Projection of Future Status Performance of Actions
Decision
In the figure, SAW is modelled as a stage before the cognitive process of decision making. SAW can be thought of as the operators internal model of the state of the environment. This representation serves as the basis for the operators decisions about what to do about the current situation and how to carry out any necessary actions. Relevant to the definition of SAW is a notion of what is important. For a given operator, the specific goals and objectives associated with the current job are highly related to SAW. For example, the arrival of a hostile ship would prove a significant event to a coastal guard in contrast to the sighting of a seagull which is considered to be an irrelevant event in this context. The limitations of the human mind have been proven to have a significant impact on the degree of SAW for humans operating in dynamic environments with large amounts of
Theoretical background
information available [1, 8]. Therefore there has been a big interest in developing techniques and systems that support the cognitive processes of SAW, from the low level of perception, to the high level of decision making.
Sources
The different logical levels of data fusion are described shortly: Level 0 & 1 - Object Assessment: These levels correspond to the process of object fusion, utilizing the information from multiple data sources over time to assemble a representation of objects of interest in the environment. These created object assessments can include estimations and predictions of kinematics (speed vector) and target type/ID. Level 2 - Situation Assessment: This level essentially involves association and combination of level 1 objects into aggregations, identifying relations. The relations can be of different types, including spatial, temporal, organizational etc. The identification of such relations between objects in the environment constitutes Situation Assessment (SA). It is important to understand the difference between SA, being a process, and SAW, which represents a state that has been achieved through [the process of] SA.
Theoretical background
Level 3 - Impact Assessment: Impact Assessment corresponds to the estimation and prediction of the effects of planned, estimated or predicted actions, involving the effects of different situations in the environment.
Level 4 Process Refinement: Process Refinement is an adaptive process that identifies what is required to improve the level 1, 2 and 3 assessments and how sensors should be configured to obtain the most relevant data for improving the assessments.
It is not difficult to see the correspondence between Endsleys cognitive model of SAW and the process of data fusion just described [2]. However, it should be pointed out that the purpose of data fusion, as opposed to SAW, is to maintain a model of the situation external to the operator [3]. Thus, in order for a computer to support SAW in the mental domain, the data fusion model persisting in the technical domain must be communicated through a Human / Computer Interface.
Expert system
Encodes knowledge
Rule base
Fact base
Domain expert
Knowledge engineer
Inference engine
Figure 3.1: Overview of a Rule-based Expert System Defining an ontology for the different types of objects, their relations and the situations of interest and using a Rule-based Expert System to reason about these concepts is one approach for performing SA. By identifying the objects and the basic relations between them in the situation picture, the rules can be used to infer higher-order relations corresponding to situations of interest for the operator. Variations of this approach have been used by Saab Systems [5] and other research groups [4] for achieving SAW (read more about these approaches below). One of the main advantages of rule-based systems is that they are relatively straightforward to implement. There are many inference engines available for developing customized expert systems simply by defining the rules. The fact that domain knowledge is expressed as rules makes the knowledge transparent and easy to interpret. However, one major drawback of classical rule-based systems is the lack of uncertainty modelling. Even though variations of rule-based systems incorporating uncertainty values for
rules and propositions have been developed and implemented, this approach implies serious restrictions and careful considerations when developing the rule base [16]. Therefore, this approach is generally not recommended.
ontology defining the domain concepts that needs to be communicated between the different agents. Each agent has a specific task including, among others; Extracting and pre-processing input information to the system (Input agent) Finding basic relations like kinematical and spatial relations between objects (Basic Relation Finder agent) Reasoning about relations and situations (Reasoner Agent) Providing interface and functionality for rule definition (Rule Editor agent) Providing database services (Database agent)
The ontology, being a data structure, is defined in the semantic web language Web Ontology Language (OWL)1. The first-order rules are defined in the OWL compliant language Semantic Web Rule Language (SWRL)2. The reasoner agent is implemented with the Java rule-based engine JESS3. However, because JESS doesnt support OWL, rules defined in SWRL have to be translated to the JESS syntax by an auxiliary translation interface [5].
1 2
BBNs are based on Bayes rule which allows the computation of the posterior probability associated with a particular proposition, which could be an object or relation, given the prior probabilities and conditional probabilities associated to it. The BBN can be represented as a graph, where the nodes correspond to the propositions and the directed edges correspond to the causal relationships between the propositions. When setting up a BBN, priori probabilities and conditional probabilities have to be specified for each node in the network. Then, posterior probabilities for each node can be updated according to Bayes rule as new information is presented to the network. In particular, new information regarding the probability of objects and basic relations can be used by the BBN for fusion to assess the probability of higher-level relations or situations. Two important requirements are pointed out by Gonsalves and his colleagues when using BBNs for SA in dynamic real-time environments [8]: 1. Rapid modelling of complex situations via BBNs 2. Efficient BBN inference based on incoming evidence For rapid modelling, they suggest that each BBN is constructed at real-time from a library of smaller component-like BBNs to assess a specific situation. To address the issue of efficient inference, they propose a way in which a BBN can be broken up into sub-networks and distributed across multiple computers, allowing computations to be carried out in parallel. In addition to this, the distribution mechanism allows computation at various levels of abstraction and granularity suitable for hierarchical organizations. The powerful model for handling uncertainty and casual relationships combined with a relatively straightforward implementation make the Bayesian Belief Network a very attractive model. However, BBNs are essentially propositional: the set of variables is fixed and finite and each variable has a fixed domain of possible values [16]. Regular BBNs lack the concept of objects and relations and thus cannot take full advantage of the structure of the domain or reuse [11]. These facts limit the application of BBNs in complex domains. To handle these constrains, different extensions to BBNs called Relational Probabilistic Models (RPM) [11, 16] have been proposed. The first-order RPMs support much more complex models including generic objects and relations, being able to express facts about some or all objects. Another weakness of regular BBNs is that they lack a natural mechanism for temporal reasoning. Various forms of Dynamic Bayesian Networks have been proposed for handling this [16]. However, it is still possible to model temporal sequence constraints in a regular BBN by including multi-state nodes corresponding to a memory [7]. Finally, because Bayesian methods are based on probability theory, adequate statistics are required for making good models. Without a sufficiently large amount of data (training examples) or other prior knowledge, approximating various probability distributions can be a difficult task. Furthermore, modelling the causal relationships may also prove a non trivial task in a complex domain.
recursive tree we find primitive events and primitive relationships between objects. These primitives constitute the base elements of the event hierarchy and need to be measured directly from the situation picture which may involve different context specific extraction algorithms. Typical primitives in the context of SA could be spatial relations between objects (in front of another object, to the right etc.), kinematical relations (approaching etc.), state of objects (inside-of a region etc) or simple events corresponding to changes in direction of motion.
10
Input nodes
Memory
Sequence
State nodes
Input nodes Figure 3.2: An example of a Bayesian Network for the complex event Smuggling. The scenario involves smuggling as vessel B picks up illegal cargo previously dumped by vessel A. Notice the multi-state node Sequence that keeps track on the sequential state of the sub-events and the multi-state node Memory that stores the previous state of Sequence.
Taking a holistic view, the system can be seen as consisting of a number of instantiated event networks, where each network is associated with an instance of a complex event and the specific objects related to it. The dynamics of these networks are event driven in the sense that changes in primitive relationships between objects and sub-events will cause the network to process; new output from a network is only produced as a consequence of a change in an event state. Higgins [7] points out that Bayesian networks do not necessarily require training data for configuration. This fact can be exploited when there is a lack of available adequate training data, which is common when one tries to capture unusual events and behaviour. Under such circumstances the structure and conditional probabilities for the network may be defined explicitly based on the experience of a domain expert. However, assessing causal relationships and probabilities may still be far from trivial, even for a domain expert. One advantage of Bayesian Networks is their capability of reasoning with uncertainties; allowing uncertainty in the input and providing a measure of probability for the output, i.e. the probability that an event has occurred or the probability that a complex event is in a particular state. Thus uncertainties in the primitives and the simple events may be propagated upwards through the hierarchy of events in a consistent manner.
11
Fuzzy sets
Short
Tall
175cm
Length
Figure 3.3: A comparison of classical crisp set boundaries versus fuzzy set boundaries for the concepts of short and tall. Notice in the case of crisp sets that a person measuring 174cm is considered absolutely short while a person measuring 176 cm is considered absolutely tall, despite the insignificantly small difference in length. Fuzzy logic is a technology that allows realistic complex models of the real world to be defined with some simple qualitative (fuzzy) descriptions of the conditions and rules. In particular, systems based on fuzzy inference with fuzzy rules have been very successful within the field of complex process control [22]. The general methodology in this context is first fuzzyfication of sensor input data from numerical to linguistic (fuzzy set) format, then evaluation of the fuzzy rules through fuzzy inference and finally de-fuzzyfication of the output (conclusions of the fuzzy rules) to numerical format (serving as input to the process).
12
Fuzzification Fuzzy Rule System Process output data (sensor data) Process
De-fuzzification
Figure 3.4: Overview of a Fuzzy Rule System in the context of complex process control. It is important to point out that the type of uncertainty captured by fuzzy logic is generally related to vagueness and imprecision rather than pure probability, which is associated with statistics. In the context of SA, concepts like objects, state of objects, properties and relations between objects could be modelled as fuzzy sets which allows for more abstract and general rules. For example, the fuzzy kinematical relation approaching could describe to what degree an object A is approaching another object B. If object A is heading straight at object B and closing in on it, they would fulfil the relationship to a degree close to one. However, the more the course of object A deviates from the course straight to B, the less degree of membership for the relation of approaching between these objects. As a contrast, in a probability approach this relation would be considered either true or false with a certain probability.
13
the fact base. To increase flexibility, fuzzy weights for each proposition in the antecedent part of the rule are supported, effectively giving priority to the more important conditions in the rule. In addition to this, the rules themselves can be (fuzzy) weighted to reflect their relative importance. Variations of this approach involving Petri nets [25, 26] have also been proposed, where the reasoning is more goal-oriented in the sense that the nets try to infer truth values for specific propositions based on a number of given truth values.
14
15
Figure 3.5: The principal idea of the Holst model. Normal (training) data is illustrated by the green data points in the velocity feature space. The two-dimensional Gaussian distribution modelling normal vessel velocities in the indicated area is illustrated by its mean and variance. A (red) point corresponding to an anomaly is shown. Taking it to a global level, features of vessel motion can be analyzed over time by considering trajectories. By clustering similar trajectories corresponding to regular traffic, a model of normal vessel routes can be constructed [17]. This approach is described in more detail below. Furthermore, by extracting local higher level events corresponding to manoeuvres in the motion patterns, it is possible characterize more abstract features of the motion. An example of such a model has been proposed in this project and is presented and discussed later on in the thesis.
16
3.5.2.1
A powerful statistical model for approximating arbitrary distributions is the Mixtures of Gaussian (MoG) densities model [27]. A MoG consists of a number of multivariate Gaussian distributions known as mixture components, where each component c has its own parameters; mean value ( c ) and covariance ( c ). In addition to these parameters each component has an associated weight ( c ), where all weights are non-negative and sum to one. The problem is how to place these Gaussians in the feature space (i.e. finding the parameters) so that all data points more or less belong to a corresponding component distribution. This process can be regarded as a clustering problem where each mixture component corresponds to a cluster. A popular technique for the parameter estimation is the Expectation-Maximization (EM) algorithm [15, 16, 27] which incrementally finds the optimal set of parameters that maximizes the likelihood of the training data. The algorithm consists of two main steps that are performed iteratively until a certain end condition is fulfilled, usually some convergence condition. During the Expectation step, the algorithm estimates for each data point x in feature space the probability that the point was generated by each particular component c, taking into account the component weights c . This is done by computing the posterior probabilities
p(c | x ) for each data point x belonging to each component c based on Bayes Rule; p( x | c ) p(c ) = p (x ) p (x | c ) c c p(x | c) c
p (c | x ) =
(3.1)
where p( x | c ) corresponds to the likelihood of component distribution c generating data point x. This expectation of point to component association is then used in the Maximization step where the estimated parameters c , c , c of each component s are updated according to a maximum likelihood criterion (maximization). The Maximization step involves adjusting the parameters of each component in such a way that the component better fits the data points, taking the previous posterior probabilities qnc into account, where q xc = p(c | x ) . More specifically, updating the parameters is done according to formula 3.2 to 3.4;
1 N qnc N n =1
(3.2)
c
c
1 N qnc xn N c n =1
1 N qnc (xn c )(xn c )T N c n =1
(3.3)
(3.4)
17
The updated model is then used to compute updated posterior probabilities in the Expectation step, and so on; i.e. the expectation and maximization steps are repeated by turns to incrementally improve the model. However, one of the major drawbacks of the classical EM-algorithm is that it is very sensitive to initialization. Depending on the starting values of the parameters, the algorithm may (and most probably will) converge to a local maximum that is different from the global maximum. Therefore it is very common to perform multiple runs, restarting the algorithm with a new, more or less random, initialization set of parameters each time it gets stuck in a local maximum. The parameters and the corresponding likelihood for the data are saved for each run, i.e. for each local maximum. Finally, the model corresponding to the maximum data likelihood is chosen. Another problem is that the model tends to degenerate during the EM-process as some components may collapse. These collapses occur when components centre on very small cluster of points and shrink until they cover only a single point, i.e. the variance approaches zero. The classical EM-algorithm does not support any inherent mechanism for determining the number of mixture components. Various extensions to EM have been proposed to solve this problem, some of them involving greedy approaches in which components are added incrementally [27]. 3.5.2.2 Anomaly Detection based on Trajectory Clustering
An Italian research group involved in video-surveillance has proposed an on-line clustering algorithm that is able to group trajectories in real-time [17]. Here the clusters are dynamic, built in real-time as new trajectory data is acquired, thus enabling the on-line learning feature. The trajectories are represented as a list of vectors, each encoding the spatial position (x- and y-coordinates) of an object at each time interval. The clusters are represented in a similar way; a list of vectors encoding the main trajectory and local approximation of the cluster variance in each time interval. Because the time interval between each point in the trajectory is fixed, objects sharing the same spatial trajectory, but having different speeds, will still correspond to different clusters. In order to check if a trajectory fits a given cluster, a distance measure must be defined. Because the Euclidian distance performs poorly in the presence of time shifts, the authors propose an alternative measure that incorporates the time aspect better. When a match is found between a trajectory and a cluster, the cluster is updated in such a manner that each cluster is a dynamic approximation of the mean and variance of all the trajectories that so far matched it. To increase the models ability to quickly adapt to changes in the normal behaviour over time, the weights of older tracks decreases exponentially with time The trajectory model can be built up as a forest of hierarchical tree structures where each node represents a cluster segment. A trajectory can then be represented as path in a tree.
18
Figure 3.6: Square a) of the figure illustrates some recorded trajectories. In b) similar trajectories are first clustered into main classes. These classes are then segmented in c). In d) each cluster segment corresponds to a node in a hierarchical tree structure. Each trajectory cluster segment (i.e. node in a tree) can be assigned certain probability based on the number of trajectories that are associated to it (i.e. nr of trajectories passing the cluster segment). The probability of a certain trajectory can then be evaluated as the product of the constituting cluster segment probabilities. An anomalous trajectory will thus appear as a path in the tree structure having a very small probability. The cluster model is dynamic in the sense that it can be updated incrementally in real-time as new trajectories are discovered. If a new trajectory is identified, the probabilities associated with each cluster segment are updated. If the new trajectory deviates too much from all of the previous paths, the model is extended by adding one or more nodes to the tree. The latter case is obviously an anomaly detection. In addition to anomaly detection, the clusters can also be used as feedback for a low-level tracking system; information about clusters is then used to enhance the predictions of the kinematical models based on linear Kalman-filters. Another approach to trajectory analysis has been proposed by the team from BAE Systems mentioned earlier [13]. They have developed a system that predicts near future vessel location based on vessel type, current location, velocity and course. The prediction model is based on a feedback neural network trained by unsupervised associative learning, where the weights are updated via Hebbian learning. If the difference between the prediction and observation is larger than a certain threshold, the system generates an alert of anomalous vessel motion. In the context of SA, both approaches described above could be used for the analysis of basic motion patterns of vessels. Extracted suspicious behaviour related to specific vessels can then be used as input to some higher level reasoning system (e.g. a rule-based expert system), effectively fusing information to achieve a higher level of SAW. Because the character of the motion trajectories vary depending on the type of vessel, it is important to create different trajectory models for different vessel classes. In particular, the motion pattern of large cargo-ships is generally more constrained and follows relatively well defined routes. A good example of such vessels is car ferries that operate in a very
19
constrained environment. In contrast, the motion pattern of small private boats is generally irregular and differs from that of larger operational vessels. Thus models based on trajectory clustering are probably most suitable for detecting larger vessels that are deviating significantly from their normal operational routes.
20
with most conventional rule-based systems, the prototype system lacks the ability to reason with uncertainties and it has no support for automatic learning over time; two features that are, however, more or less pursued by developers at Saab.
21
self organize in order to find anomalous patterns. However, it should be remarked that some initial supervised knowledge is still required in order to construct an adequate feature model capable of finding relevant anomalies. An anomaly detection agent could work in parallel with the reasoning agent and support data fusion by supplying the reasoning agent with additional qualified input. As an example, the anomaly detection agent might identify a vessel having a more or less anomalous traffic pattern and classify it as suspicious. This information is then supplied to the reasoning agent that may combine this fact with other facts to infer a particular situation, or simply to focus attention to this particular vessel. Thus, an anomaly detection agent should be regarded as a complementary solution to the existing prototype, in contrast to an alternative solution.
22
Figure 4.1: Overview of the surveillance area. The red delineation roughly delimits the surveillance area The main part of the data consist of two continuous recordings; nine days of continuous autumn traffic during the period 11/11 to 20/11 2006 and seven days of summer traffic
23
during the period 1/7 to 7/7 2006. These recordings are assumed to reflect typical vessel traffic, containing a very low level of anomalies. In addition to this, six shorter scenarios, recorded during January, February, March, April and May, were also supplied, each having a duration of 2-6 hours. Common to all the shorter scenarios is the presence of a known anomalous situation. Two of the scenarios involve a collision between two larger vessels; a large passenger ferry with a smaller freight vessel and a large foreign tanker with a smaller freight vessel. In three of the other scenarios the actual event is grounding, while in the last scenario the reason for the stop is unknown but could be related to a potential smuggling scenario. The tracks of the moving targets (vessels) are stored as rows in MS Access databases where each row corresponds to a vessel report (data point). The columns of the Access database represent the attributes of the vessel reports, including among others target ID, latitude, longitude, absolute speed, course, timestamp and AIS 4 number (MMSI). The majority (approximately 90%) of the tracks in the database correspond to data generated by AIS. These vessels are assigned a MMSI number different from zero. The rest of the tracks correspond to vessels identified and tracked by MST5 systems and are generally smaller ships lacking AIS (and thus having MMSI equal to zero). Apart from the labelled data corresponding to the vessels involved in the anomalous situations in the shorter scenarios, the data supplied in this project is unlabelled, which motivates the use of unsupervised learning techniques. It is assumed that a great majority of the unlabelled vessel traffic corresponds to regular traffic. However, more or less anomalous patterns may still be present in this data. This assumption is of key importance as it influences the learning strategy and the interpretation of the results during anomaly detection based on unsupervised clustering.
The Automatic Identification System (AIS) is a system used by ships and vessel traffic systems, principally for identification of vessels at sea. Most ships using AIS are large operational vessels and it is required by international law that all ships greater than/equal to 300 gross tons have to be fitted with AIS. For more information on AIS: http://www.imo.org/ 5 The Multi Sensor Tracker (MST) is a tracking system developed by Saab that is based on data fusion of multiple sensors, in this case radars. For information on Saabs MST: http://products.saab.se/PDBWeb/ShowProduct.aspx?ProductId=808
24
This model was chosen as it serves as a simple base model for investigating anomaly detection. The simplicity of the model makes it rather generic and therefore applicable in other similar domains.
25
roughly the same size, a single Gaussian will centre close to zero (i.e. approximately in the middle of the two clusters) and grow rather wide in order to cover both clusters. Data points located close to zero will thus be regarded as perfectly normal occurrences, which is not what we would expect in reality. On the contrary, such points may correspond to vessels that have stopped in the middle of a travelling lane and should thus be regarded as potential anomalies. In order to model regular data more efficiently, other approaches involving more complex cluster models have been investigated. In particular, two models that support multiple clusters have been implemented and evaluated in this project; the MoG model and the Fuzzy ART Neural Network model.
To solve the problem of finding the optimal number of components and avoid the need to run the algorithm multiple times from random initializations, an extension to the classical EM-algorithm has previously been developed [27]. The algorithm, known as Greedy EMlearning, is based on a greedy approach that iteratively builds the optimal mixture model adding new components one at a time. Instead of starting with a, more or less, random configuration of a predefined number of components and improve upon this configuration with regular EM, the mixture model is built component wise. The algorithm starts by determining the optimal one-component mixture. It then starts repeating two steps until a stopping criterion is met. The steps are; 1) insert a new component and 2) apply EM until convergence. Before inserting a new component, a set of randomly initialized candidate components are first evaluated in order to determine the optimal new component for insertion in the current mixture model, i.e. the optimal candidate component that maximizes the likelihood of data in the new mixture. The evaluation of the optimal candidate component is done by partial EM searches, one for each candidate in the set, where the parameters of the existing components are fixed, i.e. optimization is done only over the parameters of each candidate component. The reason for performing partial EM searches is simply because of complexity issues. For each insertion problem, a fixed number of candidates per existing mixture component are considered when searching for the optimal component. Thus, the number of
26
candidates considered in each iteration increases linearly with the number of mixture components.
Figure 4.2: Sub-figures (a)-(d) illustrate the construction of a 4-component MoG during greedy EM. Each Gaussian component is shown as an ellipse, centred at the corresponding mean value and having a spreading corresponding to the components variance. To evaluate performance of the current mixture model, the log likelihood of a test set is calculated after each new component insertion. If the log likelihood has decreased since the previous insertion, the stopping criterion is fulfilled and the algorithm terminates, returning the previous mixture model as solution. Should the number of mixture components exceed a certain predefined threshold, the algorithm also terminates, returning the current mixture components.
ART stands for Adaptive Resonance Theory and encompasses a wide variety of neural networks based on neurophysiological models [28, 29]. In theory ART networks are defined algorithmically in terms of detailed differential equations intended as plausible models of biological neurons. However, in practice they are implemented using analytical solutions and approximations to these differential equations. Learning in ART Networks can be supervised, unsupervised or a combination of both. The ART Network implemented in this project is based solely on unsupervised learning. Unsupervised ART is very similar to other iterative clustering algorithms including EM
27
(Mixture Models) and other neural networks like Vector Quantization (VQ). However, a crucial feature of ART that distinguishes it from the rest is that the number of categories (corresponding mixture components or prototypes in VQ) is determined dynamically during learning. In other words, ART does not require a predefined number of categories before clustering. This feature is said to solve the classical stability-plasticity dilemma in machine learning by supporting fast learning without forgetting previously learnt knowledge. Fuzzy ART is an extension of binary ART that incorporates fuzzy logic. Input data is fuzzyfied by extending the feature values from strictly binary values (0 or 1) to fuzzy membership values in the range of [0 1]. This allows for more flexibility, as analogue signals can be used directly as input. In particular, Fuzzy ART better supports handling of the continuous feature values in the feature model chosen for implementation in this project. 4.3.2.2 The principal architecture of the Fuzzy ART Network
The Fuzzy ART Network is essentially a three-layer adaptive neural network. In the figure below, the vectors s , x and y correspond to the activity in the three layers; Input Field, Feature Field and Category Field. All nodes in the Feature Field and the Category Field are connected through the bottom-up weights bij and the top-down weights t ij . The Input Field simply serves as a place holder for the input pattern during the pattern categorization process described below.
y
bij
x s
Category Field
t ij
Feature Field Input Field
Input data Figure 4.3: The principal structure of the ART Network architecture 4.3.2.3 The dynamics of unsupervised ART
Before an input pattern is presented to the network it has to be normalized. Furthermore, the normalized input pattern should be complement coded in order to avoid category proliferation [29]. The dynamics of unsupervised ART can be summarized as follows; a pattern that resembles an existing category fairly well updates the category, while a pattern that does not resemble any existing category generates a new category.
28
4.3.2.3.1
Learning
When learning (clustering), a new pattern is presented to the network by activating the nodes of the input field which in turn moves the pattern to the feature field. The x signal of feature field then activates the category field through the bottom-up weights bij . The category node having the highest activation is chosen as the winner node, corresponding to the category that mostly resembles the input pattern, while all remaining nodes are inhibited. Next the x signal is first reset and then reactivated by the y signal (i.e. winning category) through the top-down weights t ij . The updated x signal is then compared to the original input pattern s . If they are sufficiently similar, resonance is said to occur, which involves learning, as the weights (in both directions) between the winning category node and the feature field are updated in order to adjust to the newly learnt pattern (compare to the updating of clusters). However, if no resonance occurs, y is reset and the current winner node inhibited. A new candidate winner node is then activated by repeating the process. This matching process is repeated until a category is found that resonates with the input pattern, i.e. a category is found that is sufficiently similar to the input pattern Eventually, if no resonance occurs with any of the existing categories, a new category with weights is created, representing the new input pattern. The learning algorithm in ART can be regarded as a greedy iterative algorithm in the sense that it incrementally builds and updates the network as new patterns are presented and that the order in which patterns are presented influence how the network evolve and its final configuration. 4.3.2.3.2 Categorization without learning
Input patterns can also be categorized by the network without any learning taking place. The dynamics of the network is similar but no learning occurs during resonance, i.e. the network returns the winning category without adjusting its weights. If no resonance occurs, the pattern is simply classified as unknown and no new category is created. 4.3.2.3.3 Parameters regulating the dynamics
The dynamics of the ART Network are essentially regulated by two parameters; vigilance and learning rate. The vigilance parameter has an interval of [0 1] and determines the threshold value related to resonance. High vigilance makes greater demands on similarity between input pattern and categories in order for resonance to occur which implies that more categories are generated (specialization). Lower vigilance loosens the resonance criterion which implies that fewer categories are generated (generalization). The learning rate parameter has an interval of [0 1] and simply determines how fast learning takes place, i.e. how fast the weights are updated. Setting the learning rate to one corresponds to the fast learning mode in which the weights are updated completely to a stable state, before next pattern is presented.
29
Implementation
5 Implementation
This chapter includes implementation details such as how data was pre-processed, description of the machine learning software packages and how they were used to train and test the models. The implementation of the machine learning algorithms and scripts for presentation and pre-processing of data has been done in MATLAB. In addition to this, the initial part of the data pre-processing was implemented in Java.
30
Implementation
Speed filter: Filters targets having very high speeds (over 44 knots). This data may correspond to very fast motor boats, helicopters or other anomalies and are therefore unsuitable for inclusion in a model of normal behaviour. Course filter: Filters targets having negative course which corresponds to unknown course. Port filter: Filters targets that are located in port areas.
Targets located in port areas usually have speeds close to zero and are therefore not very interesting to analyze in this model. As a matter of fact, the presence of these vessels tends to cause numerical instability and collapse of the mixture model during learning. Therefore a function for filtering the data points corresponding to vessels in port areas was implemented. The port filter function is based on a list of the latitude and longitude coordinates of the major ports located in Sweden, Denmark, Germany and Poland7. The function simply removes all data points located within a certain radius of the port coordinates. Finally, to decrease the size of data without removing too much information, data is subsampled by extracting every 10th sample from each individual target track. The sample process corresponds to an updating interval of approximately three to four minutes for each vessel report. This updating interval was more or less arbitrarily chosen and has been considered adequate with respect to the speed and character of the vessel motion.
The fourth argument refers to the number of candidate clusters to be considered when inserting a new mixture component during greedy EM. If this number is set to zero, standard non-greedy EM is performed instead. The test data set is optional if non-greedy EM is
7
http://www.world-register.org
31
Implementation
performed. However, a test set is required during greedy EM as the stopping criterion is dependent on the current mixtures performance on the test set, i.e. when performance starts to decrease as the number of components increase, the algorithm terminates. If the plot flag is enabled and the data is represented in one or two dimensions only, a plot window will appear illustrating the dynamics and result of the EM algorithm. The plot window includes all the data points in the background and iteratively updates the plots of the mixture components as the means and covariance are adjusted to better fit the data. Apart from the plot window, the output of the main function when the algorithm has terminated is a matrix containing the weight, mean values and covariance of each mixture component. If test data was supplied, the average of the log-likelihood of the test data is also returned.
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=4306&objectType=File
32
Implementation
33
Implementation
away from any of the categories of the network in order for resonance to occur), are classified as anomalous. If no network is available for a particular grid, all the corresponding evaluation samples are classified as unknown.
34
Experimental evaluation
6 Experimental evaluation
This chapter describes the setup, goals and results of the experimental evaluation of the anomaly detection system described and implemented in this project.
The labelled anomalies in the recorded scenarios can be categorized as three grounding scenarios, two collision scenarios and one unexpected stop scenario. The artificially created data include two simulated scenarios; a speeding scenario and a smuggling scenario. These scenarios include anomalies that correspond to vessels that are; crossing or travelling more or less perpendicular to main sea lanes
35
Experimental evaluation
travelling relatively fast in sea lanes and other speed normalized areas stopping or travelling very slow in the middle of or close to sea lanes
When training the MoG with greedy EM, the maximum number of mixture components was fixed to 20. However, this number of components was never reached during training. The number of candidate components per mixture component during insertion was fixed to 10. During anomaly detection, the only parameter that has been adjusted is the anomaly likelihood threshold. 6.1.4.2 ART Networks
The learning rate has been set to one (i.e. fast learning) and the vigilance parameter fixed to 0.95 for all networks during learning. However, the vigilance value for categorization has been varied during anomaly detection.
6.2 Results
Qualitative results from experimental anomaly detection are presented as figures, illustrating traffic corresponding to training data and evaluation data classified by the model. The colour coding for the traffic is as follows; Light grey traffic corresponds to the models training data. Dark grey traffic corresponds to the evaluation data that is classified as regular by the model. Black traffic corresponds to the evaluation data that is classified as anomalous by the model.
36
Experimental evaluation
Figure 6.1: Results from global anomaly detection of the unlabelled data using a fourdimensional MoG for each square with anomaly threshold set to -30.
Figure 6.2: Results from global anomaly detection of the unlabelled data using a twodimensional MoG for each square with anomaly threshold set to -17.
37
Experimental evaluation
Figure 6.3: Results from global anomaly detection of the unlabelled data using a fourdimensional ART Network for each square, trained using vigilance value 0.95 and evaluated using vigilance value 0.94.
Figure 6.4: Results from global anomaly detection of the unlabelled data using a twodimensional ART Network for each square, trained using vigilance value 0.95 and evaluated using vigilance value 0.92.
38
Experimental evaluation
6.2.1.1
This section presents quantitative results from comparing the set of anomalous data points detected by the different models. The point with these results is to show to what extent the different feature models and clustering algorithms detect the same anomalies. Let:
A4dim MoG be the set of unlabelled points classified as anomalous by the 4dimensional MoG
A2dim MoG be the set of unlabelled points classified as anomalous by the 2dimensional MoG
A4dim ART be the set of unlabelled points classified as anomalous by the 4dimensional ART Network
A2dim ART be the set of unlabelled points classified as anomalous by the 2dimensional ART Network AMoG = A4dim MoG U A2dim MoG be the union set of unlabelled points classified as
anomalous by the mixture models
AART = A4 dim ART U A2dim ART be the union set of unlabelled points classified as
anomalous by the ART Networks
A4dim = A4dim MoG U A4 dim ART be the union set of unlabelled points classified as
anomalous by the four-dimensional feature models
A2dim = A2dim MoG U A2 dim ART be the union set of unlabelled points classified as
anomalous by the two-dimensional feature models D be the complete set of unlabelled data, classified by the models
Then, for each particular pair of feature model and clustering algorithm, the number of points classified as anomalous divided by the total number of data points are:
A4dim MoG D = 0.72% A2dim MoG D = 0.74% A4dim ART A2dim ART D = 0.78% D = 0.73%
Let us define a similarity measure between two sets of (anomalous) points as:
S ( A1 , A2 ) =
A1 I A2 A1 U A2
39
Experimental evaluation
Using this similarity function, we can now evaluate to what degree the different pairs of feature models and/or clustering algorithms detect the same anomalies:
S ( A4dim MoG , A2dim MoG ) = 63,24% S ( A4dim MoG , A4dim ART ) = 59,66% S ( A4dim MoG , A2dim ART ) = 56,01% S ( A2dim MoG , A4dim ART ) = 69,28% S ( A2dim MoG , A2dim ART ) = 64,95% S ( A4dim ART , A2 dim ART ) = 81,54% S ( AMoG , AART ) = 60,84% S ( A4dim , A2dim ) = 70,60%
Comparing the intersection of all sets of anomalous points and the union of the corresponding sets, we get a measure of to what degree all four models detect the same anomalies:
A4dim MoG I A2dim MoG I A4dim ART I A2 dim ART A4dim MoG U A2dim MoG U A4dim ART U A2 dim ART
= 46,75%
40
Experimental evaluation
The following figures present three samples areas where all anomalies in the unlabelled data, for the particular area, are jointly detected by all four models.
57.48
57.46
57.44
LATITUDE
57.42
57.4
57.38
57.36
57.34 11.4 11.42 11.44 11.46 11.48 11.5 LONGITUDE 11.52 11.54 11.56 11.58 11.6
Figure 6.5: Obvious anomaly detections; the black tracks correspond to vessels travelling in an area where no other vessels have travelled before and in the opposite direction of the main sea lane.
41
Experimental evaluation
57.66
57.64
57.62
57.6
LATITUDE
57.58
57.56
57.54
57.52
57.5 11.2
11.22
11.24
11.26
11.28
11.3 LONGITUDE
11.32
11.34
11.36
11.38
11.4
Figure 6.6: The majority of the model tracks are concentrated to the sea lane going in northwest direction while a minority of the model tracks deviates from it. Two vessel tracks are detected as anomalous; the first travelling close to the sea lane but exactly in the opposite direction and the other crossing the sea lane in a narrow angle.
55.48
55.46
55.44
LATITUDE
55.42
55.4
55.38
55.36
55.34 14.2 14.22 14.24 14.26 14.28 14.3 LONGITUDE 14.32 14.34 14.36 14.38 14.4
Figure 6.7: The main sea lane is less distinct in this area. However, the anomaly detection stands out clearly as a vessel crossing the local tracks at a relatively high speed.
42
Experimental evaluation
6.2.2.2
The following figures present three samples areas where all anomalies in the unlabelled data, for each particular area, are exclusively detected by the two MoG models.
55.48
55.46
55.44
LATITUDE
55.42
55.4
55.38
55.36
55.34 14.6 14.62 14.64 14.66 14.68 14.7 LONGITUDE 14.72 14.74 14.76 14.78 14.8
Figure 6.8: The main two-way sea lane is clearly illustrated by the high concentration of light grey training data in two opposite directions. The anomaly, exclusively found by the MoG models, correspond to a vessel travelling in an anomalous direction as it crosses the main sea lane.
43
Experimental evaluation
55.66
55.64
55.62
55.6
LATITUDE
55.58
55.56
55.54
55.52
55.5 15.2
15.22
15.24
15.26
15.28
15.3 LONGITUDE
15.32
15.34
15.36
15.38
15.4
Figure 6.9: The main two-way sea lane is clearly illustrated by the high concentration of blue traffic in two opposite directions. A minority of the traffic travels in parallel to the main sea lane, but still follows the general direction of motion. However, the anomaly, exclusively found by the MoG models, correspond to a vessel travelling in an anomalous direction as it crosses the direction of travel of the traffic.
4-dimensional MoG & 2-dimensional MoG 57
56.98
56.96
56.94
LATITUDE
56.92
56.9
56.88
56.86
56.84 11.6 11.62 11.64 11.66 11.68 11.7 LONGITUDE 11.72 11.74 11.76 11.78 11.8
Figure 6.10: The main two-way sea lane is clearly illustrated. The anomaly, exclusively found by the MoG models, correspond to a vessel crossing the main sea lane. Notice that a few other vessels (corresponding to the blue training data) have previously crossed the sea lane at nearby locations. However, these occurrences are probably too few to have any significant impact on the trained models.
44
Experimental evaluation
6.2.2.3
The following figures present three samples areas where all anomalies in the unlabelled data, for each particular area, are exclusively detected by the two ART models.
55.82
55.8
55.78
16.6
16.62
16.64
16.66
16.68
16.7 LONGITUDE
16.72
16.74
16.76
16.78
16.8
Figure 6.11: The anomalies, exclusively detected by the ART models, correspond to a vessel that travels in locally anomalous directions and, above all, travels at a very low speed.
45
Experimental evaluation
56.64
56.62
56.6
LATITUDE
56.58
56.56
56.54
56.52
56.5 11.8
11.82
11.84
11.86
11.88
11.9 LONGITUDE
11.92
11.94
11.96
11.98
12
Figure 6.12: The few anomalies, exclusively detected by the ART models, correspond to anomalous vessel motion in the middle of the major sea lane. Specifically, the relatively high speed, and, in some cases, the reverse direction of travel, stands out as clear anomalies in contrast to regular traffic.
4-dimensional ART & 2-dimensional ART
56.82
56.8
56.78
11.2
11.22
11.24
11.26
11.28
11.3 LONGITUDE
11.32
11.34
11.36
11.38
11.4
Figure 6.13: Traffic data available for this area is relatively sparse. However, a single vessel in the upper left corner, distinguished by its relatively high speed, is exclusively detected as an anomaly by the ART models.
46
Experimental evaluation
This scenario involves a vessel making an unexpected stop in its route, a few nautical miles north-west of Bornholm. The situation may be a potential smuggling scenario where the cargo is dumped during the stop and later picked up by another vessel.
55.5
55.48
55.46
55.44
LATITUDE
55.42
55.4
55.38
55.36
55.34 14.6 14.62 14.64 14.66 14.68 14.7 LONGITUDE 14.72 14.74 14.76 14.78 14.8
Figure 6.14: Potential smuggling scenario, a few nautical miles north-west of Bornholm. The tracks in black correspond to a vessel that has made an unexpected stop in the area indicated by the circle. Table 6.1: Statistics for the unexpected stop scenario 4-dimensional MoG Mean log-likelihood of the labelled data inside the indicated stop area Share of unlabelled data for this region having a likelihood less or equal to the mean likelihood of the data inside the stop area -2.99 5.76% 2-dimensional MoG -9.31 1.57%
47
Experimental evaluation
6.2.3.2
Vessel collisions
The first collision scenario involves a collision between a large freighter vessel and a smaller freighter vessel, a few nautical miles north-west of Bornholm. After the collision, a number of vessels in the neighbouring area appear to travel straight to the collision zone in order to assist the involved vessels.
55.5
55.48
55.46
55.44
LATITUDE
55.42
55.4
55.38
55.36
55.34 14.6 14.62 14.64 14.66 14.68 14.7 LONGITUDE 14.72 14.74 14.76 14.78 14.8
Figure 6.15: The tracks of the two colliding vessels are coloured as black, the location of the collision encircled. The dark grey tracks highlight unlabelled data classified as anomalous by a four-dimensional MoG at threshold -10, which correspond approximately to the mean log-likelihood of the colliding vessel data. These tracks are assumed to correspond to vessels deviating from their routine course in order to assist the distressed vessels.
Table 6.2: Statistics for the first collision scenario 4-dimensional MoG Mean log-likelihood of data corresponding to first vessel in the collision zone Mean log-likelihood of data corresponding to second vessel in the collision zone Share of unlabelled data for this region having a likelihood less or equal to the mean likelihood of the second vessel in the collision zone -10.71 -10.04 0.3% 2-dimensional MoG -10.52 -10.46 0.8%
48
Experimental evaluation
The second collision scenario involved an accident at the west coast of Sweden, south-west of Varberg, where a large passenger ferry collided with a smaller freighter vessel.
57
56.98
56.96
56.94
LATITUDE
56.92
56.9
56.88
56.86
56.84 12 12.02 12.04 12.06 12.08 12.1 LONGITUDE 12.12 12.14 12.16 12.18 12.2
Figure 6.16: Vessel collision at the west coast of Sweden, south-west of Varberg. The tracks of the two colliding vessels are coloured as black and dark grey, the location of the collision encircled.
Table 6.3: Statistics for the second collision scenario 4-dimensional MoG Mean log-likelihood of data corresponding to vessel in black in the collision zone Mean log-likelihood of data corresponding to vessel in dark grey in the collision zone Share of unlabelled data for this region having a likelihood less or equal to the mean likelihood of the vessel in magenta in the collision zone -1.41 -1.72 74,7% 2-dimensional MoG -9.23 -10.08 2.0%
49
Experimental evaluation
6.2.3.3
Grounding scenarios
The first grounding scenario took place close to the coast of Denmark, north-west of Helsingor.
56.16
56.14
56.12
56.1
LATITUDE
56.08
56.06
56.04
56.02
56 12.4
12.42
12.44
12.46
12.48
12.5 LONGITUDE
12.52
12.54
12.56
12.58
12.6
Figure 6.17: The tracks in black correspond to the vessel that has grounded, the grounding spot indicated by the circle.
Table 6.4: Statistics for the first grounding scenario 4-dimensional MoG Mean log-likelihood of the labelled data in the indicated grounding area Share of unlabelled data for this region having a likelihood less or equal to the mean likelihood of the labelled data in grounding area -1.30 11,4% 2-dimensional MoG -4.88 28.5%
50
Experimental evaluation
55.66
55.64
55.62
55.6
LATITUDE
55.58
55.56
55.54
55.52
55.5 12.8
12.82
12.84
12.86
12.88
12.9 LONGITUDE
12.92
12.94
12.96
12.98
13
Figure 6.18: Grounding scenario west of Malm sea port (data around the port area has been filtered away as can been seen in the figure). The tracks in black correspond to the vessel that has grounded, the grounding spot indicated by the circle.
Table 6.5: Statistics for the second grounding scenario 4-dimensional MoG Mean log-likelihood of the labelled data in the indicated grounding area Share of unlabelled data for this region having a likelihood less or equal to the mean likelihood of the labelled data in grounding area -0.25 21.5% 2-dimensional MoG -3.64 25.2%
51
Experimental evaluation
The third grounding scenario took place in the resund passage, southwest of Landskrona:
56
55.98
55.96
55.94
LATITUDE
55.92
55.9
55.88
55.86
55.84 12.6 12.62 12.64 12.66 12.68 12.7 LONGITUDE 12.72 12.74 12.76 12.78 12.8
Figure 6.19: Grounding scenario in the resund passage, southwest of Landskrona. Traffic around the Danish island Ven has been filtered away as can been seen by the large white area in the centre of the figure. The tracks in black correspond to the vessel that has grounded, the grounding spot indicated by the circle.
Table 6.6: Statistics for the third grounding scenario 4-dimensional MoG Mean log-likelihood of the labelled data in the indicated grounding area Share of unlabelled data for this region having a likelihood less or equal to the mean likelihood of the labelled data in grounding area -1.96 19.2% 2-dimensional MoG -5.13 23,5%
52
Experimental evaluation
The vessel in black corresponds to large freighter travelling at a speed of around 7 knots. It slows down to a speed below 1 knot in the indicated rendezvous area, dumps its cargo and then accelerates up to 7 knots after a short while. Meanwhile, another smaller vessel (dark grey coloured) approaches the rendezvous area perpendicular to the sea lane at a speed of around 12 knots. When it reaches the area it slows down, picks up the cargo and accelerates up to 12 knots, heading back in the direction it emerged from.
55.66
55.64
55.62
55.6
LATITUDE
55.58
55.56
55.54
55.52
55.5 15
15.02
15.04
15.06
15.08
15.1 LONGITUDE
15.12
15.14
15.16
15.18
15.2
Figure 6.20: The tracks in black and dark grey correspond to the large freighter and smaller pick-up vessel respectively.
Table 6.7: Statistics for the large freighter (black tracks) involved in the smuggling scenario 4-dimensional MoG Mean log-likelihood of data inside of the rendezvous area Mean log-likelihood data outside of the rendezvous area Share of unlabelled data in this region having a likelihood less or equal to the mean likelihood of artificial data inside of the rendezvous area -3.23 -0.82 6.5% 2-dimensional MoG -8.72 -5.94 1.3%
53
Experimental evaluation
Table 6.7: Statistics for the smaller pick-up vessel (dark grey tracks) involved in the smuggling scenario 4-dimensional MoG Mean log-likelihood data inside of the rendezvous area Mean log-likelihood data outside of the rendezvous area Share of unlabelled data in this region having a likelihood less or equal to the mean likelihood of artificial data inside / outside of the rendezvous area -5.10 -48.39 2.2% / 0% 2-dimensional MoG -10.55 -55.11 0.5% / 0%
6.2.4.2
In this scenario, artificial data for a vessel travelling at a relatively high speed in a sea lane has been evaluated. The mean speed of the (real) vessel data used for training the model in this area is 6.97 knots and the variance 3.22. The speed of the artificial vessel is around 15 knots as it travels in the main direction of the sea lane.
55.66
55.64
55.62
55.6
LATITUDE
55.58
55.56
55.54
55.52
55.5 15
15.02
15.04
15.06
15.08
15.1 LONGITUDE
15.12
15.14
15.16
15.18
15.2
Figure 6.21: Speed vectors in black correspond to artificially generated data of vessel travelling at a relatively high speed in the sea lane.
54
Experimental evaluation
Table 6.8: Statistics for the speeding vessel 4-dimensional MoG Mean log-likelihood of artificial data Share of unlabelled evaluation data classified as anomalous at this level -8.25 0.5% 2-dimensional MoG -14.7 0.2%
55
56
57
58
normal behaviour as new categories are created instantaneously when needed. Furthermore, ART Networks can easily be extended to support supervised learning by introducing an ARTMAP structure; this extension is discussed in the future work section below. However, ARTs ability to quickly learn new patterns also implies that the system is more sensitive to overtraining and noise. If an anomalous pattern is mistakenly presented as a normal pattern during training, the network will instantaneously create an erroneous category for this pattern. Learning of the MoG with EM is much less sensitive to noise, due to the statistical nature of the algorithm, particularly if noise is stochastic; the relative weights of noisy points are negligible when updating the clusters during EM. Thus, if there is reason to believe that training data might include anomalies, EM and MoG should be preferred to unsupervised ART. The MoG has a natural support for uncertainty measure as it presents the likelihood of points belonging to it. The ART Network lacks a corresponding feature as it does not support any output information regarding the degree of similarity between the categories and the input patterns. Another disadvantage that ART shares with neural networks in general, is that they are not transparent in the same sense as MoG. While a MoG can be analysed explicitly by considering the underlying probability distributions, ART Networks behave more like blackboxes.
Generally, segments are characterized as regular segments if they are of a specific minimum length and have angular speed and linear acceleration close to constant. The regular segments are then further divided into subcategories according to more specific kinematical constraints: Regular straight segment Linear speed is different from zero during whole segment. Linear acceleration is close to zero during whole segment. Angular speed is close to zero (constant course) during whole segment.
Weak yaw segment Angular speed is close to constant and different from zero during whole segment.
59
Relative difference in angle between course at starting point and course at ending point is within certain interval (less angular difference than sharp yaw).
Sharp yaw segment Angular speed is close to constant and different from zero during whole segment. Relative difference in angle between course at starting point and course at ending point is within certain interval (greater angular difference than weak yaw).
Brake segment Linear acceleration is constant and below certain negative threshold during whole segment.
Acceleration segment Linear acceleration is constant and above certain positive threshold during whole segment.
Stop segment Linear speed is very close to zero during whole segment.
The parts of the trajectory that still remain when all regular segments have been extracted correspond to irregular segments. These segments generally involve more or less jerky motion as linear and angular acceleration differs from zero. 7.7.1.2 Characterizing vessel trajectories in a high-dimensional feature space
A vessel trajectory is characterized by first segmenting it and then counting the number of segments of each type within a limited time window. Each such partial trajectory analysis generates a data point in a high-dimensional feature space where each dimension corresponds to the number of occurrences of a particular manoeuvre (i.e. segment type). As an example we might consider a seven-dimensional space where the features are the number of 1) straight segments 2) weak yaw segments 3) sharp yaw segments 4) brake segments 5) acceleration segments 6) stop segments and 7) irregular segments. In addition to this, features corresponding to total time duration for each segment could be added to the model. Clusters in this high-dimensional space would correspond to typical manoeuvre patterns associated to particular vessel classes; large freighter ships, car ferries, fishing boats, pilots etc. The clusters could be represented by a mixture of multivariate Poisson distributions that model the expected number of occurrences of each manoeuvre within each cluster. To estimate the parameters of the Poisson mixture, an adapted version of the EM-algorithm may be used, similar to the case with mixtures of Gaussians. The Poisson distribution has the advantage that it is defined by a single parameter (compare to the Gaussian that is defined by its mean and variance). This implies that the Poisson distribution does not run the risk of collapsing like the Gaussian distribution, during the EM algorithm, as the variance of the Gaussian approaches zero. An alternative to the Poisson mixture model is to introduce a Hidden Markov Model (HMM) that treats the different types of segments as states of the vessel motion. By using a HMM for modelling sequences of segments, the likelihood of an observed sequence of manoeuvres can be calculated, thereby identifying more or less anomalous sequences of manoeuvres.
60
61
Conclusion
8 Conclusion
In the first phase of this project, an overview of different approaches involving artificial intelligence to solve the problem of situation assessment has been presented. The potential and capability of the different approaches has been discussed, resulting in suggestions for future development of the prototype system for situation assessment developed by Saab. Two approaches to solving the problem of uncertainty handling have been proposed; hierarchical event recognition through Bayesian Networks and a Fuzzy Reasoning approach involving fuzzy rules and fuzzy truth values. Characterizing all events and situations, which may be of interest for a supervisor, is a very difficult task. The set of available examples for each particular event or situation is usually very limited, as the events and situations sought for occur relatively rarely and may vary significantly from one case to another. However, turning it the other way round, these rare events and situations can be detected as anomalies in a model of routine behaviour. Usually, large amounts of data corresponding to routine behaviour are available, which motivates the use of Data Mining and clustering techniques for building models of normal behaviour. In the second phase of this project, anomaly detection based on unsupervised machine learning techniques has been further investigated and implemented in the domain of sea surveillance. The system is based on unsupervised clustering of vessel traffic data, where the data model is specified by a particular feature model. Two feature models have been developed and implemented; one two-dimensional model based on the momentary velocities in the two-dimensional plane and the other a four-dimensional extension of the base model incorporating the momentary spatial position. Clustering has been done by two different models and learning algorithms; one based on Mixtures of Gaussians (MoG) densities and Expectation-Maximization, and the other based on Neural Networks and Adaptive Resonance Theory (ART). The feature models and algorithms have been evaluated using real recorded data from the sea surveillance centre in Malm and artificial self-made data. The recorded data is in principle unlabelled and is assumed to reflect normal traffic. However, a few labelled anomalies involving vessel collisions, groundings and unexpected stops are present in the recorded data. The artificial data correspond to vessels involved in a simulated smuggling scenario and speeding scenario. The four combinations of the two feature models and clustering algorithms have been evaluated by training on a majority of the unlabelled recorded data and testing on a minority of the recorded data, including the labelled anomalies, and the artificial anomalous data. Qualitative results show that the most distinguishing anomalies found in the unlabelled data correspond to vessels that are crossing sea lanes and vessels travelling close to and in the opposite direction of sea lanes. Generally, the four models detect the same anomalies to a rather large extent. However, results indicate that the MoG models are more sensitive to the direction of the motion while the ART models are more sensitive to the absolute speed of the motion. Qualitative analysis of the recorded labelled anomalies indicated that the MoG models were quite effective in detecting the collisions and the unexpected stop as anomalies among routine traffic. However, the models performed rather poorly in detecting the anomalies related to the grounding scenarios. Comparing the two feature models, results indicate that the spatial information incorporated in the four-dimensional feature model does not appear to enhance the systems ability to detect the labelled anomalies in general.
62
Conclusion
Comparing the two cluster models, the mixture model has been considered more suitable when training data contains noise or anomalies; the dynamic learning feature of ART implies fast learning of new patterns at the cost of increased sensitivity to noise. Generally, the types of anomalies that are detectable by the implemented systems, at a reasonable rate of alarm, are rather elementary in nature. It is not hard to imagine other less advanced ad-hoc systems that could identify these anomalies. However, the generality of the system should be pointed out as it has the potential of being applied to other domains, involving generic motion in the two-dimensional plane, requiring minimal adaptation and no specific domain knowledge as the systems are based on unsupervised algorithms. An extension, involving a more sophisticated feature model, based on manoeuvres in the motion pattern, has been proposed for future work. Such a feature model would prove a powerful complement, capable of capturing anomalies in motion tracks that evolve over time. Another suggested extension involves the introduction of semi-supervised learning, allowing an operator to render learning more effective by confirming or rejecting anomalies detected by the system. Such a system, implemented as an ARTMAP structure, would benefit from operator feedback without being dependent on it, as it self-organizes when operator input is not available.
63
References
[1] M. Endsley, Theoretical underpinnings of Situation Awareness: a critical review, Situation Awareness Analysis and Measurement, Mahwah, NJ 2000 D. A. Lambert, Situations for situation awareness, Proceedings of the 4th International Conference on Information Fusion, August 2001 K. Wallenius, Generic Support for Decision-Making in Effects-Based Management of Operations, Doctoral Thesis in Computer Science, Royal Institute of Technology in Stockholm, Sweden 2005 C. Matheus, M. Kokar, K. Baclawski, J. Letkowski, C. Call, M. Hinman, J. Salerno and D. Boulware, SAWA: An Assistant for Higher-Level Fusion and Situation Awareness, In Proc. SPIE Conference on Multisensor, Multisource Information Fusion, pages 75-85. (2005) C. Matheus, M. Kokar, K. Baclawski, A Core Ontology for Situation Awareness, Proceedings of the Sixth International Conference on Information Fusion, pages 545 552, 2003 J. Edlund, Situation Assessment 2005 final report, L/API-06:0022, Saab Systems Jrflla, Stockholm 2005 (company internal report) R.P. Higgins, Automatic Event Recognition for Enhanced Situational Awareness in UAV Video, SIMA 2005, Atlantic City, NJ. S. Das, R. Grey, P. Gonsalves, Situation Assessment via Bayesian Belief Networks, Proceedings of the fifth International Conference on Information Fusion, Annapolis, Maryland, July 2002 J. Ivansson, Situation Assessment in a Stochastic Environment using Bayesian Networks, Masters Thesis at Department of Electrical Engineering, Linkping University, Sweden 2002
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10] P. Bladon, R. J. Hall, W. A. Wright, Situation Assessment using Graphical Models, Proceedings of the Fifth International Conference on Information Fusion, Vol. 2, Pages 886-893, 2002
64
[11] C. Howard, M. Stumptner, Situation Assessments Using Object Oriented Probabilistic Relational Models, Proceedings of the 7th International Conference on Information Fusion, Vol. 2, 2005 [12] B.J. Rhodes, N.E. Bomberger, M. Seibert, A. M. Waxman, Maritime Situation Monitoring and Awareness Using Learning Mechanisms. SIMA 2005, Atlantic City, NJ. [13] B.J. Rhodes, N.E. Bomberger, M. Seibert, A. M. Waxman, Associative Learning of Vessel Motion Patterns for Maritime Situation Awareness., Proceedings of the International Conference on Information Fusion, 2006, Venice, Italy [14] G. Jakobson, L. Lewis, J. Buford, Col. E. Sherman, Battlespace Situation Analysis: The Dynamic CBR Approach., Proceedings of the Military Communications Conference (MILCOM 2004), November 2005, Monterey CA. [15] T. M. Mitchell, Machine Learning, McGraw-Hill, 1997, New York [16] S. Russell, P. Norvig, Artificial Intelligence - A Modern Approach, Second Edition, Prentice Hall, 2003, New Jersey [17] C. Piciarelli, G.L. Foresti, L. Snidaro, Trajectory Clustering and its Applications for Video Surveillance, Proceedings of the International Conference on Advanced Video and Signal based Surveillance, pages 40-45, September 15-16 2005 [18] Holst, J. Ekman, Avvikelsedetektion av fartygsrrelser, internal report Saab Systems, 2003, Jrflla Stockholm, Sweden [19] Watson, Applying Case-Based Reasoning: Techniques for Enterprise Systems, Morgan Kaufmann Publishers, 1997, San Francisco, USA [20] G. Jakobson, J. Buford, L. Lewis, Towards an Architecture for Reasoning about Complex Event-based Dynamic Situations, International Workshop on Distributed Event-Based Systems (DEBS '04), 24-25 May 2004, Edinburgh, Scotland, UK [21] R. Orchard, Fuzzy Reasoning in Jess: The FuzzyJ Toolkit and FuzzyJess, Proceedings of the Third International Conference on Enterprise Information Systems (ICEIS), pp. 553-542, Portugal, July 7-10 2001 [22] H.-J. Zimmermann, Fuzzy Set Theory And its applications, Fourth Edition, Kluwer Academic Publishers, 2001, Dordrecht, Netherlands
65
Logic
Programming
Associates,
[24] S.-M. Chen, A Fuzzy Reasoning Approach for Rule-Based Systems Based on Fuzzy Logics, IEEE Transactions on Systems, Man and cybernetics part B: Cybernetics, Vol. 26, No. 5, Oct 1996 [25] S.-M. Chen, Weighted Fuzzy Reasoning Using Weighted Fuzzy Petri Nets, IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 2, Mar/Apr 2002 [26] M. Gao, M. Zhou, Fuzzy Reasoning Petri Nets, IEEE Transactions on Systems, Man and cybernetics part A: Systems and Humans, Vol. 33, No. 3, May 2003 [27] J. J. Veerbeek, Mixture Models for Clustering and Dimension Reduction, PhD thesis at the University of Amsterdam, 2004 [28] G. A. Carpenter, S. Grossberg, Fuzzy ARTMAP: A Neural Network Architecture for Incremental Supervised Learning of Analog Multidimensional Maps, IEEE Transactions on Neural Networks, vol 3, no. 5, September 1992 [29] G. A. Carpenter, S. Grossberg, Adaptive Resonance Theory, The handbook of Brain Theory and Neural Networks, second edition, September 1998, revised April 2002
66
www.kth.se