Académique Documents
Professionnel Documents
Culture Documents
Keywords: With the spectacular growth of air traffic demand expected over the next two decades, the safety of the air
Air transportation transportation system is of increasing concern. In this paper, we facilitate the “proactive safety” paradigm to
Deep learning increase system safety with a focus on predicting the severity of abnormal aviation events in terms of their risk
Support vector machine levels. To accomplish this goal, a predictive model needs to be developed to examine a wide variety of possible
System safety
cases and quantify the risk associated with the possible outcome. By utilizing the incident reports available in the
Risk assessment
Aviation Safety Reporting System (ASRS), we build a hybrid model consisting of support vector machine and an
ensemble of deep neural networks to quantify the risk associated with the consequence of each hazardous cause.
The proposed methodology is developed in four steps. First, we categorize all the events, based on the level of
risk associated with the event consequence, into five groups: high risk, moderately high risk, medium risk,
moderately medium risk, and low risk. Secondly, a support vector machine model is used to discover the re-
lationships between the event synopsis in text format and event consequence. In parallel, an ensemble of deep
neural networks is trained to model the intricate associations between event contextual features and event
outcomes. Thirdly, an innovative fusion rule is developed to blend the prediction results from the two types of
trained machine learning models, thereby improving the prediction. Finally, the prediction on risk level cate-
gorization is extended to event-level outcomes through a probabilistic decision tree. By comparing the perfor-
mance of the developed hybrid model against another three individual models with ten-fold cross-validation and
statistical tests, we demonstrate the effectiveness of hybrid model in quantifying the risk related to the con-
sequences of hazardous events.
1. Introduction safety of the air transportation system has received great attention [6-
8]. Over the past decades, numerous efforts have been dedicated to the
The International Air Transport Association (IATA) forecasts that development on comprehensive safety metrics as well as qualitative/
the air traffic demand will grow at a 3.7% annual Compound Average quantitative approaches to detect anomalous behavior, assess the
Growth Rate (CAGR), and there will be 7.2 billion air travelers in 2035 safety, and quantify the risk associated with one subsystem (i.e., airport
[1,2]. Thus, compared to the 3.8 billion air travelers in 2016, the air ground operations, flight tracking, or taxiway and runway system) or a
travel demand will nearly double over the next two decades. The rapid mixture of subsystems in the air transportation system [9]. For ex-
growth in air traffic demand will put pressure on the air transportation ample, Netjasov and Janic [10] performed a comprehensive review of
system, which is already struggling to cope with the current demand. four state-of-the-art methods, namely causal models, collision risk
System-wide departure delays and en-route congestion will deteriorate models, human error models, and third-party risk models, for the as-
due to the massive increase in the number of aircraft within the limited sessment of risk and safety in civil aviation. McCallie et al. [11] de-
airspace. Such consequences further contribute to an increased number scribed a taxonomy of possible attacks on one of the most critical
of conflicts in the air traffic, which might escalate to collisions or evolve technical upgrades – Automatic Dependent Surveillance Broadcast
into other hazardous events [3-5]. The rapid increase of air traffic de- (ADS-B) system – in the Next Generation (NextGen) Air Transportation
mand also puts tremendous pressure on air traffic operators in main- System; they examined the risks inherent in the ADS-B implementation
taining the system safety at the same level as before. All the afore- and their potential impact, thereby supporting risk mitigation and
mentioned factors could increase the occurrence rate of air traffic management. Margellos and Lygeros [12] used Monte Carlo Simulation
incidents. (MCS) to assess the probability of flights meeting the Target Windows
Since air traffic accidents usually have severe consequences, the (TWs) constraints and the probability of aircraft having conflicts in the
*
Corresponding author.
E-mail address: sankaran.mahadevan@vanderbilt.edu (S. Mahadevan).
https://doi.org/10.1016/j.dss.2018.10.009
Received 22 May 2018; Received in revised form 6 September 2018; Accepted 17 October 2018
Available online 05 November 2018
0167-9236/ © 2018 Elsevier B.V. All rights reserved.
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
4-D Trajectory-Based Operations (TBO). Landry et al. [13] developed a hazardous events due to the following challenges:
conflict detection and resolution (CD&R) decision support system for
the airport surface operations, and illustrated this with an example from 1. High-dimensional data. Each incident record consists of more than
the Hartsfield Atlanta International Airport. Di Ciccio et al. [14] pro- 50 items ranging from the operational context (weather, visibility,
posed a model to automatically detect flight trajectory anomalies by flight phase, and flight conditions) to the characteristics of the
using semi-publicly available data for the sake of alerting the receiving anomalous operation (aircraft equipment, malfunction type, and
parties to take responsive actions in a timely manner. Sankararaman event synopsis). The url https://asrs.arc.nasa.gov/docs/dbol/ASRS_
et al. [15] investigated the vulnerability of Trajectory-Based Operations CodingTaxonomy.pdf gives a detailed description of each item.
(TBO) to technology disruption, and evaluated the degradation of four Besides, air traffic operation is a complex system composed of a
key performance indicators (KPI) subject to the technological impair- number of programs and personnel. It is likely for the incident to
ment. occur at any phase and location due to a variety of factors (i.e.,
In general, there are two principal strategies to enhance the air operation violation, visibility, human factors etc.), which makes it
transportation system safety: (I) increase system/subsystem reliability, hard for the machine learning algorithm to predict the exact level of
i.e., reduce the probability of operators making mistakes or systems risk associated with each event outcome.
having malfunctions; and (II) improve the operators' preparedness. 2. Primarily categorical data. Over 99% of the items in the ASRS da-
When a hazardous event occurs, the operator is able to take appropriate tabase are categorical, and only one attribute (crew size) is nu-
safety measures to reduce the cost of the consequence caused by the merical in each record. Although the categorical information offers a
hazardous event in a timely manner [16-18]. Although tremendous high-level description of the context of each incident, very limited
progress has been made over the past decades for enhancing the air information specific to each abnormal event can be derived from
transportation system safety, most of the studies emphasize identifying such categorical information, e.g., how did the incident happen,
the accident precursors and initiating corrective actions to prevent fu- how did the incident evolve over time in the system, what operation
ture errors, which can be grouped into the first strategy. Among the the pilot took to resolve the issue, etc. Since the categorical features
existing studies, much of the effort has been devoted to investigating are not informative, how to blend the predictions of the model
human factors induced accidents and constructing causal models trained by the categorical features and the predictions from the
[10,19]. For example, Wiegmann and Shappell [20] described the model trained by other informative indicators (e.g., event synopsis)
Human Factors Analysis and Classification System (HFACS) and pro- without compromising the model performance is an issue worthy of
vided aviation case studies on human factor analysis with HFACS. investigation.
However, air transportation is a complex system of systems involving 3. Unstructured data. One important unstructured attribute is event
many varied but yet interlinking distributed networks of human op- synopsis, which is a concise summary of the incident/accident in the
erators, technical/technological procedures and systems, and organi- form of text. Mining causal relationships from the unstructured text
zations/stakeholders. The scarcity in the operational data and the in- data is a daunting task [25]. A common characteristic in handling
volvement of a variety of human operators are major challenges that text data is to transform it into numerical data in the representation
severely affect the prediction of erroneous operation, and challenge the of term frequency of each individual document. Along this direction,
assessment and improvement of the system reliability. In view of this, it researchers have developed numerous techniques to elicit useful
would be beneficial to leverage the second strategy and emphasize the knowledge from the text data, e.g., support vector machine [26],
“proactive safety” paradigm [21], focused on strengthening the cap- latent Dirichlet allocation-based topic mining [27-29], Naïve Bayes-
ability and efficiency of the operators in response to the abnormal based document classification [30], k-nearest-neighbor (k-NN)
events with appropriate actions, thereby mitigating the consequence or classification [31], and others [32,33]. Among them, support vector
severity of hazardous events. In this connection, a large number of in- machine [34] has demonstrated good performance in text categor-
cident/accident reports have been filed over the past half century to- ization because it overcomes over-fitting and local minima, thereby
gether. A few machine learning studies have been carried out to analyze achieving good generalization to applications [35,36].
these reports. For example, Oza et al. [22] developed two algorithms – 4. Imbalanced class distribution. In the ASRS, the number of records in
Mariana and nonnegative matrix factorization-based algorithm – to one class (i.e., possible outcome) is significantly larger than that of
perform multi-label classification for the reports of Aviation Safety the others. The distribution of the outcomes for all the hazardous
Reporting System (ASRS). Budalakoti et al. [23] presented a set of al- events reported between January 2006 and December 2017 is illu-
gorithms to detect and characterize anomalies in discrete symbol se- strated in Fig. 1. As can be observed, the number of records across
quences arising from recordings of switch sensors in the cockpits of different classes is highly imbalanced. Such imbalanced class dis-
commercial airliners. These studies have not explored the intricate re- tribution has posed a serious challenge to machine learning algo-
lationships between abnormal event characteristics and the induced rithms which assume a relatively well-balanced distribution [37].
consequences. In this paper, we are motivated to fill this gap by
quantifying the risk associated with the consequence of hazardous Since ASRS consists of a variety of heterogeneous data (e.g., text
events through a hybrid machine learning model. This type of analysis data, categorical data, and numerical data), it is challenging to develop
will help to develop a decision support system that can learn the pat- one single model to learn from the entire dataset for predicting the risk
terns in the data and help the analyst examine incidents/accidents associated with the consequence of hazardous events. In this paper, we
quickly and systematically, thus assisting the risk manager in risk adopt the “divide and conquer” strategy to split the data into two parts:
quantification, priority setting, resource allocation and decision making structured and unstructured, and develop a hybrid model to handle the
in support of the implementation of proactive safety paradigm. two types of data, respectively. By doing this, we are able to leverage
To do this, we have utilized the Aviation Safety Reporting System the strengths of each model in processing certain type of data.
(ASRS), which is a database of incident reports that provides a plethora Compared with building a single model, the dimension of the problem
of incidents/accidents that occur over the course of the past several is reduced significantly. With respect to categorical data, considering its
decades. The incident reports are submitted by pilots, air traffic con- high-dimensional feature space, deep learning might be a good candi-
trollers, dispatchers, cabin crew, maintenance technicians, and others, date to discover the highly intricate relationship between event con-
and describe both unsafe occurrences and hazardous situations [24]. textual characteristics and event consequence due to its powerful ability
ASRS covers almost all the domains of aircraft operations that could go in establishing a dense representation of the feature space, which makes
wrong. However, it is a non-trivial task to build a model from the ASRS it effective in learning high-order features from the raw data [38,39].
data for the prediction of risk associated with the consequence of As a result, a deep learning model is developed to process the
49
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
Fig. 1. Distribution of outcomes for all incidents/accidents between January 2006 and December 2017.
categorical data for the purpose of learning the associations between information on aviation safety and human factors [24]. As one of its
event contextual features and event outcomes. In parallel, a support primary tasks, ASRS collects, processes, and analyzes voluntarily sub-
vector machine model is trained to identify the relationships between mitted aviation incident/situation reports from pilots, flight attendants,
text-based event synopsis and the risk level associated with the con- air traffic controllers, dispatchers, cabin crew, ground workers, main-
sequence of each incident. Afterwards, the predictions from the two tenance technicians, and others involved in aviation operations.
machine learning models are fused together for quantifying the risk Reports submitted to ASRS include both unsafe occurrences and
associated with the consequence of each incident. Finally, the predic- hazardous situations. Each submitted report is first screened by two
tion on risk level categorization is extended to event-level outcomes analysts to provide the initial categorization and to determine the triage
through a probabilistic decision tree. Compared to the current state of of processing. During this process, ASRS analysts identify hazardous
the art in machine learning for aviation safety, we make the following situations from the reports, and issue an alert message to persons in a
contributions: position to correct them. Based on the initial categorization, ASRS
analysts might aggregate multiple reports on the same event to form
1. We have developed a machine learning methodology to learn the one database “record”. If any information needs to be further clarified,
relationships between abnormal event characteristics and their ASRS analysts might choose to call a reporter over the telephone to
consequences. We focus on the data set that has a large number of gather more information. After all the necessary information is col-
outcomes and imbalance of available data regarding these out- lected, the reports are codified using the ASRS taxonomy and recorded
comes. This challenge is overcome by grouping the event outcomes in the database. Next, some critical information in each record is de-
into five risk categories and by up-sampling the minority classes. identified; then ASRS distributes the incident/accident records gathered
2. A probabilistic fusion rule is developed to blend the predictions of from these reports to all the stakeholders in positions of authority for
multiple machine learning models that are built on different seg- future evaluation and potential corrective action development.
ments of the available data. Specifically, a hybrid model blending Table 1 presents a sample situation record extracted from the ASRS
SVM prediction on unstructured data and deep neural network en- database. As can be observed, each record has more than 20 fields
semble on structured data is developed to quantify the risk of the ranging from event occurrence location to event characteristics. Here,
consequence of hazardous events, in which the record-level pre- we briefly introduce several primary attributes in each record:
diction probabilities, class-level prediction accuracy in each re-
spective model, and the proportion of each class in the records with 1. Time and location of the abnormal event: The incident in Table 1
disagreeing predictions are considered in model fusion. occurred on February 2017 at the BUR airport in USA. Here, BUR is
the International Air Transport Association (IATA) code of the air-
The rest of the paper is structured as follows. In Section 2, we port, and it refers to the Hollywood Burbank Airport located in Los
provide a brief introduction to the ASRS database, and describe the Angeles County, California. Besides, the record also provides the
principal elements in each record. In Section 3, we develop a hybrid basic altitude above Mean Sea Level (MSL). In this case, the ha-
model to learn the complex relationships between event characteristics zardous event occurred when the flight was at an altitude of 2500 ft.
and event outcomes. In Section 4, we evaluate the performance of the 2. Environment: This describes the surrounding conditions encom-
trained machine learning model on a test dataset. In Section 5, we passing the aircraft operations. Examples are: flight conditions
provide concluding remarks and discuss future research directions. (VMC, IMC, marginal, or mixed), and weather elements such as
visibility, light, and ceiling. Here, VMC refers to visual meteor-
2. Aviation safety reporting system ological condition (VMC) under visual flight rules (VFR) flight. In
VMC, pilots have sufficient visibility to fly the aircraft maintaining
The Aviation Safety Reporting System (ASRS) is a program operated visual separation from the terrain and other aircraft. Different from
by the National Aeronautics and Space Administration (NASA) with the VMC, instrument meteorological condition (IMC) represents the
ultimate goal of increasing aviation system safety by discovering system category that describes weather conditions that require pilots to fly
safety hazards hidden in the multitude of air traffic operations. Over the primarily by reference to instruments under instrument flight rules
past few decades, ASRS has become one of the world's largest sources of (IFR). Visibility and cloud ceiling are two key factors in determining
50
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
Table 1 record will have the component field, and it describes which aircraft
A sample incident/accident record extracted from ASRS. component is faulty (i.e., engine, nosewheel, transponder etc.) and
Attribute Content the type of the problem as well (i.e., malfunctioning, improperly
operated, or others). This field might be empty if there is no com-
Time/Day Date: 201702 ponent malfunction.
Local Time Of Day: 0601–1200
5. Person: Person describes the fundamental information of the per-
Place Locale Reference.Airport: BUR.Airport
State Reference: CA
sonnel that reports the problem. For example, the location of the
Altitude.MSL.Single Value: 2500 person, his/her location in the aircraft, the reporter organization,
Environment Flight Conditions: VMC the qualification of the reporter, and the contributing human factor
Weather Elements/Visibility.Visibility: 10 (e.g., fatigue, distraction, confusion, and time pressure).
Light: Daylight
6. Events: This attribute provides the basic characteristics of anom-
Ceiling.Single Value: 12,000
Aircraft Reference: X alous behavior and its consequence. In this record, BUR Tower de-
ATC / Advisory.Center: BUR tected the excursion of the flight from the assigned altitude and is-
Aircraft Operator: Personal sued course correction to avoid the traffic. After taking course
Make Model Name: PA-28R Cherokee Arrow All Series
correction to avoid traffic, the pilot did not maintain the same
Crew Size.Number Of Crew: 1
Operating Under FAR Part: Part 91
course heading to Burbank while ATC assumed that the pilot was on
Flight Plan: VFR correct heading. As a result, the BUR Tower recognized the error
Mission: Personal and issued a new heading and commanded the pilot to climb back to
Flight Phase: Initial Approach the assigned altitude to perform landing from the very start.
Route In Use: Visual Approach
7. Assessments: Assessments provide evaluation of the root cause and
Airspace.Class D: VNY
Component Aircraft Component: Engine Air other factors that contribute to the occurrence of the abnormal
Problem: Malfunctioning event.
Person Reference: 1 8. Synopsis: All the above fields are categorical except the crew size.
Location Of Person: X The last row of Tables 1 and 2 provide several sample event sy-
Location In Aircraft: Flight Deck
Reporter Organization: Personal
nopses elicited from the ASRS database. As can be observed, event
Function.Flight Crew: Single Pilot synopses gives a brief summary of the cause of the problem, and
Qualification.Flight Crew: Private how the abnormal event evolves over time. Such information is
Experience.Flight Crew.Total: 175 helpful for us to assess the severity of the hazardous event and
Experience.Flight Crew.Last 90 Days: 30
analyze possible consequences.
Experience.Flight Crew.Type: 175
ASRS Report Number.Accession Number: 1428684
Human Factors: Confusion The above descriptions provide a brief introduction to the physical
Events Anomaly.Deviation - Altitude: Excursion From Assigned Altitude meanings of some important fields in each record. In ASRS, other re-
Anomaly.Deviation - Track/Heading: All Types cords might differ from the above sample records in certain fields or
Anomaly.Deviation - Procedural: Clearance
Anomaly.Inflight Event/Encounter: Unstabilized Approach
they might have additional fields which are absent in the sample record
Detector.Person: Air Traffic Control due to the difference in the type and characteristics of the abnormal
When Detected: In-flight event.
Result.Flight Crew: Returned To Clearance
Result.Flight Crew: Became Reoriented
Result.Air Traffic Control: Issued New Clearance 3. Proposed method
Result.Air Traffic Control: Issued Advisory/Alert
Assessments Contributing Factors/Situations: Airspace Structure
Contributing Factors/Situations: Human Factors
In this section, we develop a hybrid method to estimate the risk of
Primary Problem: Human Factors the event consequence with the consideration of operational conditions
Synopsis PA28R pilot reported becoming confused during a VFR flight to and event characteristics. To build the hybrid model, we investigate the
BUR and lined up on VNY. BUR Tower detected the error and incidents/accidents that occurred from January 2006 to December
issued a new heading and climb back to assigned altitude.
2017. The detailed proposed framework is outlined in Fig. 2, which
shows a four-step procedure. In the first step, we perform a risk-based
event outcome categorization. That is we employ the level of risk as a
whether the weather condition is VMC or IMC, and the boundary
quantitative metric to measure the severity of the event outcome and
between IMC and VMC is known as VMC minima. “Marginal VMC”
collapse all the possible event outcomes into five categories: high risk,
refers to conditions above but close to one VMC minima or more.
moderately high risk, medium risk, moderately medium risk, and low
3. Aircraft: This attribute reports the basic aircraft and flight in-
risk. Given the restructured categories, two models are developed in the
formation, including the aircraft make and model, flight type (per-
second step to process the unstructured data (text data) and structured
sonal or commercial), the number of crew onboard, flight plan,
data (categorical and numerical information). Specifically, a support
flight mission, flight phase, and the airspace class the flight is in.
vector machine (SVM) model is developed to represent the relationship
Such information details the specific flight phase in which the ha-
between event synopsis and the risk pertaining to the event outcome,
zardous event occurs. The basic aircraft information also enables us
and an ensemble of deep neural networks (DNN) is trained to predict
to have an understanding of the scale of the possible event con-
the level of risk based on contextual features of each abnormal event. In
sequence.
the third step, we develop an innovative fusion rule to blend the pre-
4. Component: If there is any mechanical failure in the aircraft, the
diction results from the SVM and DNN models. Finally, the risk-level
Table 2
Two examples of event synopsis in ASRS.
Date Event synopsis
February 2017 A319 Flight Attendant reported smoke in the cabin near the overwing exits during climb.
January 2017 A319 flight crew reported fumes in the flight deck. After a short time they began to experience problems concentrating and the onset of incapacitation.
51
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
52
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
53
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
3.2. Model construction support vector machine (SVM) has been found to provide a higher
prediction accuracy than most other techniques [49]. Different from
The ASRS data can be classified into two groups: structured data and other classifiers, SVM implements a structural risk minimization func-
unstructured data. In particular, structured data include numerical data tion, which entails finding an optimal hyperplane to separate the dif-
(e.g., crew size) and categorical data (e.g., flight phase, weather visi- ferent classes with the maximum margin. The introduction of the
bility, flight conditions etc.). In ASRS, event synopsis is the only un- structural risk minimization (regularization term) function also enables
structured data, and it is used to describe how the accident/incident SVM to have a good generalization characteristic, thereby guaranteeing
occurred. the lower classification error on unseen instances.
Before developing the two machine learning approaches, we up- The first essential step in applying SVM for text classification is to
sample the minority classes in the restructured risk domain. Basically, transform the text data into numerical feature vectors; this is referred to
up-sampling is the process of randomly duplicating observations from as text representation. Since the text descriptions of the incidents/ac-
the minority class to reinforce its signal. With respect to the reorganized cidents appear in the form of long sentences in different records, we
risk categories, we perform resampling with replacement for the three employ the bag-of-words (BoW) representation to transform the text
minority classes, namely: high risk, moderately high risk and moder- data into numerical features. Specifically, we utilize the tokenizer in the
ately medium risk, so that the number of records for them matches with natural language processing toolbox to split raw text into sentences,
the majority class (medium risk). After the up-sampling operation, we words and punctuation [50]. Afterwards, all the stop words and
develop two individual models to handle the structured and un- punctuation are eliminated from the BoW representation, and we ex-
structured data, separately. The flowchart of the developed method is tract distinct words from all the event synopsis records, assign a fixed
illustrated in Algorithm 1. integer id to each term present in any event synopsis, and represent the
Algorithm 1. Hybrid model development. content of a document as a vector in the term space, which is also re-
3.2.1. Support vector machine for text-based classification ferred to as a vector space model. However, in many cases, term fre-
Regarding the text data, our objective is to automatically categorize quency alone might not be adequate in differentiating the documents.
the abnormal event into the correct risk category based on the content For example, since the word ‘the’ is a very common term that appears in
of the event synopsis. In the past decades, a large number of statistical every document, the term frequency method will tend to incorrectly
and computational methods have been developed for classification emphasize documents with the usage of frequent terms (such as, ‘the’,
based on text data [33,43], for example, Naïve Bayes [44,45], support ‘a’) but carrying very little information about the contents of the
vector machine [46], maximum entropy [47], and others [48]. The document, while more meaningful terms might be shadowed. As
54
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
1 + nd
tf-idf(t , d ) = tf(t , d ) × 1 + log
1 + df(d, t ) (1)
55
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
Table 4
Performance metrics for the two trained models. Here, the support column denotes the number of occurrences of each class in the actual observation, the consistent
prediction column represents the number of consistent predictions between the two models for each class, and consistent prediction accuracy denotes the proportion
of records in the consistent model predictions that are correctly labeled.
Support vector machine Shared attributes Deep learning ensemble
Predicted label Support Consistent prediction Consistent prediction accuracy Predicted label
1 2 3 4 5 1 2 3 4 5
True label 1 s
A1,1 s
A1,2 s
A1,3 s
A1,4 s
A1,5 N1 N1c λ1 True label 1 d
A1,1 d
A1,2 d
A1,3 d
A1,4 d
A1,5
2 s
A2,1 s
A2,2 s
A2,3 s
A2,4 s
A2,5 N2 N2c λ2 2 d
A2,1 d
A2,2 d
A2,3 d
A2,4 d
A2,5
3 s
A3,1 s
A3,2 s
A3,3 s
A3,4 s
A3,5 N3 N3c λ3 3 d
A3,1 d
A3,2 d
A3,3 d
A3,4 d
A3,5
4 s
A4,1 s
A4,2 s
A4,3 s
A4,4 s
A4,5 N4 N4c λ4 4 d
A4,1 d
A4,2 d
A4,3 d
A4,4 d
A4,5
5 s
A5,1 s
A5,2 s
A5,3 s
A5,4 s
A5,5 N5 N5c λ5 5 d
A5,1 d
A5,2 d
A5,3 d
A5,4 d
A5,5
decimal into an integer, the model prediction (3) after the transfor- measures the probability that the trained model assigns the test
mation might be the least probable prediction in the two models. record a to a given class i. To be specific, with respect to the DNN
Consequently, the weighted sum operation is inappropriate to be im- ensemble, the record-level prediction probabilities can be measured
plemented in this problem. as the ratio of the most frequent prediction to the total number of
We develop a probabilistic fusion rule to blend the predictions from model predictions (which is 10, in this case). For example, if the
the two models. The framework of the proposed fusion rule is demon- most frequent prediction of the ten models is class 5, and it appears
strated in Fig. 4. In the first place, if the two models have the same six times out of the ten model predictions, then the model prediction
prediction (prediction 1 = prediction 2), then the prediction result is probability for this particular record belonging to class 5 is 6/10
easy to be determined. If the two models have different predictions, (0.6). Regarding the support vector machine, since samples far away
then we calculate the probability of the test record belonging to each from the separating hyperplane are presumably more likely to be
class among the five risk categories in each model. To illustrate the classified correctly, we can use the distance from the hyperplane as
proposed method, Table 4 reports the performance metrics of the two a measure of record-level prediction probability following the
models on a validation dataset. The validation dataset consists of N1, method introduced in Ref. [58].
N2, N3, N4, and N5 records in the five respective classes, where Ais, j and 2. Proportion of each class in the records with disagreeing predictions:
Aid, j represents the number of records that actually belong to class i but When the two models are trained, there are the same number of
are labeled as class j in the support vector machine and DNN ensemble, records belonging to each class in the training dataset. In other
respectively. The third to the seventh columns present the confusion words, the data is balanced for each class in the training dataset.
matrix of the trained support vector machine on the validation dataset, However, when we use the trained models to make predictions on
while the confusion matrix of the deep learning ensemble is reported in test records, we only need decisions when the two trained models
the thirteenth to seventeenth column. The ninth column of Table 4 have inconsistent predictions. With respect to the set of disagreeing
reports the number of consistent predictions between the two trained predictions, the actual proportion of records belonging to each class
models on the validation dataset, while the tenth column reports the might be different (imbalanced) from the training dataset (ba-
accuracy of the correctly labeled records among the consistent predic- lanced). The use of the model trained by the balanced class dis-
tions of the two models in the validation dataset. tribution will result in a biased estimator if it is directly utilized for
When the two models have disagreeing predictions, one important making predictions on the set of records with inconsistent model
underlying mechanism when developing the fusion strategy is: since no predictions.
model is perfect, each model is expected to have misclassifications or To address this issue, we introduce a weight factor to correct the
make erroneous predictions. However, the valuable information em- trained model to fit the class distribution in the set of records with
bodied in the misclassifications should be further utilized to correct the disagreeing model predictions, thereby ensuring that the updated
model predictions on the subsequent unseen test dataset in a way that estimator is unbiased. The number of records with consistent model
makes up the class that the observation should belong to if we know predictions in each class is complementary to the amount of records
how often the model mislabels the class as other classes. Considering with inconsistent model predictions in that class. In general, the
the various types of misclassifications that the trained model is prone to more records the two classifiers agree on, the less the number of
make, one way to achieve this objective is to increase the probability of records the two classifiers disagree on. As a result, we formulate the
labeling the observation as the correct class, while reducing the prob- following equation to represent the total number of records with
ability of labeling the record as the predicted class given by the model. inconsistent model predictions across all the classes considered:
Fortunately, such information can be derived from the confusion ma- 5
trix. In fact, a confusion matrix not only provides class-level model T= (Ni i × Nic )
prediction accuracy, but also the probability of mislabeling a record as i=1 (2)
other classes. Next, we will utilize the model misclassification prob-
ability to correct the model prediction on new test records. Suppose the where i is the class label, Ni denotes the actual number of records
two models have different predictions on a new test record a; there are that should have been labeled as class i, Nic represents the number of
three important considerations in determining the probability of the consistent predictions between the two models on all the records,
new test record a belonging to each of the five classes i in the two and λi is the model accuracy with respect to the consistent predic-
models: tions.
Considering the total number of disagreeing predictions across all
1. Record-level prediction probabilities: As mentioned earlier, two the classes, the proportion w.r.t. class j is:
models have been trained: support vector machine and a deep Nj × N jc
j
neural network ensemble. The record-level probability p (Ya = i ) p (j ) =
(3)
T
56
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
Given the already correctly labeled samples by the two trained p (Y = i| Y = j ) is calculated as:
models, the probability of a new test record a that results in dis-
Ais, j
agreeing predictions between the two models belonging to class j is p
s
p (Y = i | Y = j ) = 5
(j). As
i = 1 i, j (8)
3. Class-level accuracy: This measures the degree of consistency be-
where denotes the number of samples with model predictions being
Ais, j
tween model predictions and actual observations. Mathematically, it class j, but actually belonging to class i in the validation dataset.
can be represented as: p (Y = i| Y = j ) , where Y = j represents the In a similar way, the metric p (Y = i| Y = j ) in the DNN ensemble is
samples with model predictions being class j, and computed accordingly. Given the model classification performance
p (Y = i| Y = j )(j i ) quantifies the proportion of samples with p (Y = i| Y = j ) , the probability of test record a belonging to each class
model predictions being class j that should have been labeled as in each model is computed as below:
class i. In particular, when j = i, then p (Y = i| Y = i ) measures the
ratio of samples actually belonging to class i to the number of 5
s s Nj j × N jc
samples with model predictions being class i, which can also be p (Yas = i) p ( Y = i | Y = j ) × p (Y a = j ) × 5
(Nk k × Nkc )
referred to as a likelihood function. In other words, the likelihood j =1 k=1
such that the sum of the probability over the five classes in each model where di and dj denote two documents, wm,i (wm,j) is the frequency of
is 1. object (words, or terms) om present in document di (dj), and k is the
Nj c
j × Nj
number of unique objects across all the documents.
5
j=1
p (Y = i| Y = j ) p (Ya = j) × T The similarity metric defined in Eq. (10) measures the cosine of the
p (Ya = i) =
Nj c
j × Nj
angle between vector-based representations of the two documents.
5 5
p (Y = i| Y = j ) p (Ya = j) × Since there are multiple documents belonging to the same risk category
i=1 j=1 T (7)
as the test record a, we formulate the following equation to address
With respect to the support vector machine, the likelihood function such issues:
57
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
Table 5 Table 6
The number of records belonging to each risk category. Confusion matrix.
Class High Moderately high Medium Moderately Low Predicted as positive Predicted as negative
medium
Actually positive True Positives (TP) False Negative (FN)
Number of 12,327 8261 18,841 8636 16,508 Actually negative False Positives (FP) True Negative (TN)
records
58
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
Table 7
Performance metrics for the two trained models. Here, the support column denotes the number of occurrences of each class in the actual observation, the consistent
prediction column represents the number of consistent predictions between the two models for each class, and consistent prediction accuracy denotes the proportion
of records in the consistent model predictions that are correctly labeled.
Support vector machine Shared attributes Deep neural networks
Predicted label Support Consistent prediction Consistent prediction accuracy Predicted label
1 2 3 4 5 1 2 3 4 5
True label 1 550 82 205 69 41 947 491 0.82 True label 1 567 74 204 46 56
2 14 951 27 13 12 1017 988 0.93 2 25 931 28 5 28
3 153 65 592 119 59 988 448 0.76 3 208 72 495 97 116
4 16 7 43 943 25 1034 936 0.96 4 19 11 48 927 29
5 20 17 34 22 883 976 861 0.93 5 38 27 45 28 838
same amount of data as in the majority class to form a balanced dataset. Table 8
After up-sampling, the dataset contains 18,841 records for the high, Model predictions with respect to test record a.
moderately high, medium, and moderately medium risk classes re- Model 1 2 3 4 5
spectively, and 16,508 number of records in the low risk category. After
the up-sampling operation, there are 91,872 number of records in total. SVM 0.10 0.20 0.20 0.20 0.30
All the input variables are summarized in Fig. 5, and they are DNN 0.20 0.40 0.10 0.02 0.28
grouped into two categories: unstructured data and structured data. The
structured data includes 29 different variables ranging from aircraft
on the training dataset and selects the best combination. After the op-
information to event characteristics, while the unstructured (text) data
timal hyperparameters are identified, we run the trained support vector
only includes event synopsis. With the 29 structured variables and one
machine algorithm on the validation data to obtain its performance
unstructured variable, we train the SVM and DNN models.
metrics. The left part of Table 7 reports the performance metrics of the
Given a trained classifier and a test dataset, the relationship be-
trained support vector machine model on the validation dataset.
tween model predictions and true observations can be represented as a
In the DNN model, all the categorical features are encoded using one
confusion matrix, as illustrated in Table 6. To assess the performance of
hot encoding. We randomly select 85% records from the training da-
every machine learning model, we adopt three most commonly used
taset to train a deep neural network, then we repeat the same procedure
performance metrics.
ten times to obtain an ensemble of deep neural networks. Every DNN
has the same structure – 8 hidden layers and 40 neurons per layer. We
( TP
)
1. Precision: Mathematically, Precision TP + FP is the ratio of correctly
choose an Adam Optimizer [61] with a learning rate of 0.001 to per-
predicted positive observations to the total predicted positive ob- form backward propagation in adjusting the weight variables with the
servations, and high precision typically corresponds to low false objective of minimizing the categorical cross entropy as defined in Eq.
positive predictions. (13).
TP
( )
2. Recall: Recall is defined as Recall TP + FN quantifies the ratio of
1
n
correctly labeled positive observations to all the positive observa- d
L (Y , Y ) =
d
[Yi log Yi ]
tions in the actual class. It measures the ability of the trained model n i=1 (13)
in identifying positive observations from all the samples that should
where n is the number of training samples, Yi is a one-hot-encoding
have been labeled as positive.
representation of the actual observation with the corresponding class
3. The F1 score is the weighted average of precision and recall, which d
precision * recall label being 1 and the values of other classes being zero, and Yi is the
is mathematically described as F1 = 2 × precision + recall . In the F1
probabilistic estimation of ensemble deep learning models on the re-
score, the relative contribution of precision and recall is the same. In
cord i.
other words, the F1-measure is the harmonic mean of precision and
After the ten deep neural networks are trained, we use them to make
recall.
predictions on the validation dataset, and the results are reported in
Table 7. It is observed that the DNN ensemble does not perform as well
4.2. Experimental analysis as the SVM model in terms of precision and recall for classes 2, 3, and 5
due to the low level of information contained in the categorical fea-
We split the 91,872 records into three parts: training set (85%), tures. As shown in the ninth column of Table 7, the two trained models
validation set (5%), and test set (10%). The training set is used to guide agree on 491, 988, 448, 936, and 961 predictions with respect to the
the machine learning algorithms to optimize the relevant parameters so five classes. Among the consistent predictions of the two models, 401,
as to minimize prediction error; the validation set is utilized to develop 922, 339, 897, and 804 predictions with respect to the five classes are
the performance metrics for an unseen dataset (i.e., prediction accu- correctly classified, from which we can compute the ratio of consistent
racy). With the performance metrics obtained from the validation da- predictions that are correctly labeled in the validation dataset. Next, we
taset, we then blend the predictions from the two trained models for the utilize the two trained models to make predictions on the test dataset.
test dataset. With the performance metrics acquired on the validation dataset, we
Regarding the SVM estimator for text classification, we leverage blend the predictions of the two models. Suppose the two models have
grid search to optimize the relevant hyperparameters. Specifically, we probabilistic classifications on the test record a as shown in Table 8;
optimize the loss function (hinge, log, perceptron, squared loss, etc.), each cell represents the probability that the test example a is a member
the regularization function (l1,l2, or elasticnet), the coefficient of reg- of the class. Regarding the test record a, the SVM model supports class 5
ularization function ([10 2, 10 5]), and the range of N-gram (N-gram is a the most, whereas the DNN ensemble assigns the highest probability to
contiguous sequence of n items from a given sample of text). The grid class 2.
search exhaustively generates candidates from the set of parameter Following the method introduced before, the total number of in-
values, and evaluates all the possible combinations of parameter values consistent model predictions is T = (947 − 401) +
59
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
(1017 − 922) + (988 − 339) + (1034 − 897) + (976 − 804) = 1599. From the above computational results, test record a has the highest
Then the proportion of each class in the disagreeing records is calcu- probability (0.37) of being labeled as class 3 in the SVM model. As a
lated as: result, this test record a is assigned to class 3 in the hybrid model. For
the remaining test records, we blend the two model predictions in a
p = [0.34 0.06 0.40 0.09 0.11]
similar manner.
Then the probability of assigning the test record a to class 1 in the SVM To compare the performance of all the investigated models, we
model is computed as: randomly split the original dataset into ten equal sized groups to per-
5 form ten-fold cross-validation. The procedure presented in Algorithm 1
is repeated over the ten equal sized groups. Fig. 6 shows the proportions
s s
p (Yas = 1) [p (Y = 1| Y = j ) × p (Y = j) × p (j)]
j=1
of five risk categories in the records with disagreeing predictions.
550 82
=
550 + 14 + 153 + 16 + 20
× 0.1 × 0.34 +
82 + 951 + 65 + 7 + 17
× 0.2 × 0.06 Among the five risk categories, the medium risk class has the largest
205 69 proportion of records with disagreeing predictions, followed by low risk
+ × 0.2 × 0.4 + × 0.2 × 0.09
205 + 27 + 592 + 43 + 34 69 + 13 + 119 + 943 + 22 and high risk classes. The proportion of each class with disagreeing
41
+ × 0.3 × 0.11 prediction is relatively stable with a small variability. Due to the im-
41 + 12 + 59 + 25 + 883
= 0.0468 balanced class distribution in the records with disagreeing predictions,
the adjustment factor introduced in Eq. (4) plays an essential role in
In a similar way, the probabilities of labeling the test record a as embodying such information in the subsequent hybrid model predic-
other classes in the SVM model is calculated: tions.
p (Yas = 2) 0.0137, p (Yas = 3) 0.0646 In addition, we implement ordinal logistic regression (OLR) for the
p (Yas = 4) 0.0193, p (Yas = 5) 0.0324 purpose of comparing its performance with the DNN ensemble on the
structured data [62,63]. The computational result of ten-fold cross-va-
After normalization, the probability of the test record belonging to lidation for the four models is demonstrated as a boxplot in Fig. 7. It is
each class in the SVM model is: worth noting that we shift the values of the three performance in-
p (Yas = 1) = 0.26, p (Yas = 2) = 0.08, p (Yas = 3) = 0.37, dicators of ordinal logistic regression by +0.4 for the sake of demon-
strating the four models' performance deviation clearly. As can be ob-
p (Yas = 4) = 0.11, p (Yas = 5) = 0.18.
served, the hybrid model outperforms SVM, DNN ensemble, and OLR in
Likewise, with the DNN ensemble, the probabilities of the test re- terms of precision, recall and F1 score. More importantly, the hybrid
cord a belonging to class 1 to 5 are calculated as: model has a much smaller deviation for all the performance metrics
when compared to all the other three models. In other words, it has the
p (Yad = 1) = 0.35, p (Yad = 2) = 0.15, p (Y ad = 3) = 0.28, most stable prediction capability, thereby making it the best candidate
p (Yad = 4) = 0.04, p (Yad = 5) = 0.18. for quantifying the risk of abnormal events. Since OLR is inferior to the
Fig. 7. The performance of hybrid model versus support vector machine (SVM), ensemble of deep neural networks (DNN), and ordinal logistic regression (OLR).
60
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
Table 9
Event synopsis of test record a.
Date Event synopsis
March 2017 After initiating descent on a visual approach with glideslope out of service; an A319 Flight Crew initiated a go-around when flight director caused airspeed increase
and climb to intercept altitude set for ILS to previously assigned runway.
other three models, we do not analyze its performance any further. confusion matrices also embody the misclassification information. As
From a quantitative viewpoint, based on the cross-validation results, illustrated in Fig. 8, it is most probable for all the three models to
the proposed hybrid model yields a better performance in precision, misclassify the records in class 1 as class 3, and vice versa. Such in-
with an average score of 0.81, 3% higher than the scores of SVM and formation can be utilized to guide the further refinement of the hybrid
6% higher than DNN ensemble models. The proposed hybrid model also model.
outperforms the other two models regarding the recall rate. In other Considering that the test record a is labeled as class 3 (medium risk),
words, the hybrid model has a better performance in correctly identi- and the event synopsis of the test record a is shown in Table 9, then a
fying the records that actually belong to each class. Besides, the F1 tree is built to demonstrate the likelihood of the occurrence of every
score of the hybrid model is 3% higher than the support vector machine event outcome in the medium risk category. By measuring the simi-
and 6% higher than deep neural networks on average. Regarding the larity between the event synopsis of test record a and that of other
predictions on the structured data, ordinal logistic regression performs records in the medium risk category, the probability for test record a
much worse than deep learning ensemble. In summary, the proposed having each outcome is obtained, and the result is illustrated in Fig. 9.
decision rule to fuse the predictions from the two models is effective in The red filled node is a chance node used to identify the event in a
enhancing the hybrid model performance. decision tree where a degree of uncertainty exists. In this case, since the
In addition to the boxplot illustrated in Fig. 7, statistical t-test is also hybrid model does not have the capability to make event-level outcome
used to check whether there is a significant difference in the prediction prediction, we expand the risk-level prediction to event-level outcome
performance of the four models [64]. Specifically, we make a null hy- prediction by considering all the possible event outcomes under the
pothesis that the means of population from hybrid model and any other corresponding risk category. Along each line is shown the probability of
individual model are the same, and rejection of this hypothesis in- each event to occur. As can be seen, it is most probable for the test
dicates that there is sufficient evidence that the means of the popula- record a to have event outcome “Flight Crew Executed Go Around
tions are significantly different, while failing to reject this hypothesis Missed Approach”, followed by “Flight Crew Became Reoriented” and
reveals that the distributions are identical. The t-test rejected the null “Air Traffic Control Issued New Clearance”. With respect to other risk
hypothesis for all three aforementioned performance indicators, thus categories, similar diagrams can be constructed to represent event-level
implying that there is a statistically significant improvement in the outcomes. Such event-level outcomes enable to connect the root cause
performance of the hybrid model compared to the SVM, DNN ensemble (i.e., malfunction) and the consequence of the incident at the event
and OLR models. outcome level.
Fig. 8 shows the confusion matrices of the three trained models for
the five considered classes in one test case. It can be observed that the
hybrid method significantly increases the number of correct predictions 5. Conclusion
for class 1 and class 3, while maintaining the prediction accuracy for
the remaining three classes almost at the same level as the other two This paper developed a hybrid model by blending support vector
algorithms. The confusion matrices in Fig. 8 provide comprehensive machine and an ensemble of deep neural networks to quantify the risk
information in terms of the number of correctly identified observations pertaining to the consequence of hazardous events in the air transpor-
in each class. Across the five risk groups, the hybrid model correctly tation system. The SVM model is trained using the event synopsis, while
identifies 200 more observations than the SVM model. Besides, the DNN ensemble is trained using categorical and numerical data. By
merging the predictions from the two models, we formulate a hybrid
61
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
Fig. 9. The probabilistic event outcomes for test record a. (For interpretation of the references to color in this figure, the reader is referred to the web version of this
article.)
model to assess the severity of abnormal event outcomes in terms of variables enables risk-informed decision making, thereby aiding the
their risk levels using 64,573 reports on incidents/accidents that were implementation of a proactive safety paradigm in the future.
reported between January 2006 and December 2017.
Several contributions have been made in this paper. First, we de- Acknowledgement
velop a risk-based event outcome categorization strategy to project the
event outcomes in the space of risk quantification by collapsing the The research reported in this paper was supported by funds from
original 36 unique event outcomes into five risk groups. Secondly, this NASA University Leadership Initiative program (Grant No.
paper proposes a support vector machine and deep learning-based hy- NNX17AJ86A, Project Technical Monitor: Dr. Kai Goebel) through
brid model to make prediction on the risk level associated with the subcontract to Arizona State University (Principal Investigator: Dr.
event outcome by analyzing the event contextual features and event Yongming Liu). The support is gratefully acknowledged.
description in an integrated way. Thirdly, an innovative fusion rule is
developed to blend the predictions from the two trained machine References
learning algorithms. Finally, a probabilistic tree is constructed to map
the risk-level prediction to event-level outcomes. The results demon- [1] IATA forecasts passenger demand to double over 20 years, http://www.iata.org/
strate that the developed hybrid model outperforms the individual pressroom/pr/Pages/2016-10-18-02.aspx, Accessed: 2017-12-28.
[2] X. Zhang, S. Mahadevan, Aircraft re-routing optimization and performance assess-
models in terms of precision, recall and F1 score. ment under uncertainty, Decision Support Systems 96 (2017) 67–82.
Future work can be carried out in the following directions. If the [3] The Next Generation Air Transportation System (NextGen), https://www.nasa.gov/
actions taken by the pilot or other operators involved in the response to sites/default/files/atoms/files/nextgen_whitepaper_06_26_07.pdf, Accessed: 2018-
02-08.
abnormal events are available, it will be useful to extend the developed [4] P. Fleurquin, J.J. Ramasco, V.M. Eguiluz, Systemic delay propagation in the US
hybrid model to account for such important information. The inclusion airport network, Scientific Reports 3 (2013) 1159.
of such information helps us to identify risk mitigation actions for un- [5] N.B. Sarter, H.M. Alexander, Error types and related error detection mechanisms in
the aviation domain: an analysis of aviation safety reporting system incident re-
foreseen events in the future by learning from the actions that have
ports, The International Journal of Aviation Psychology 10 (2) (2000) 189–206.
been taken in past. Another direction worthy of investigation is to [6] I. Hwang, C.E. Seah, Intent-based probabilistic conflict detection for the next gen-
leverage data mining and pattern recognition techniques to discover the eration air transportation system, Proceedings of the IEEE 96 (12) (2008)
2040–2059.
intricate event sequences by incorporating more comprehensive reports
[7] C. Barnhart, D. Fearing, V. Vaze, Modeling passenger travel and delays in the na-
on severe accidents available in National Transportation Safety Board tional air transportation system, Operations Research 62 (3) (2014) 580–601.
(NTSB). Since many events interact with each other and tend to exhibit [8] K. Ng, C. Lee, F. Chan, A robust optimisation approach to the aircraft sequencing
certain patterns, it is helpful to identify such interactions from the and scheduling problem with runway configuration planning, Industrial
Engineering and Engineering Management (IEEM), 2017 IEEE International
available records in ASRS, and characterize them in a rigorous manner. Conference on, IEEE, 2017, pp. 40–44.
The modeling of such interactions can facilitate our understanding of [9] L.F. Vismari, J.B.C. Junior, A safety assessment methodology applied to CNS/ATM-
the evolution of abnormal events, and help to develop corrective stra- based air traffic control system, Reliability Engineering & System Safety 96 (7)
(2011) 727–738.
tegies. Last but not the least, root cause analysis needs to be performed [10] F. Netjasov, M. Janic, A review of research on risk and safety modelling in civil
to identify the major contributing factors and variables that lead to the aviation, Journal of Air Transport Management 14 (4) (2008) 213–220.
occurrence of incidents with high risk consequences. Along this direc- [11] D. McCallie, J. Butts, R. Mills, Security analysis of the ADS-B implementation in the
next generation air transportation system, International Journal of Critical
tion, appropriate permutation-based significance measures or other Infrastructure Protection 4 (2) (2011) 78–87.
alternative methods need to be utilized to recognize the important in- [12] K. Margellos, J. Lygeros, Toward 4-D trajectory management in air traffic control: a
cident contributing factors. The identification of crucial contributing study based on Monte Carlo simulation and reachability analysis, IEEE Transactions
62
X. Zhang, S. Mahadevan Decision Support Systems 116 (2019) 48–63
on Control Systems Technology 21 (5) (2013) 1820–1833. Citeseer, 1998, pp. 41–48.
[13] S.J. Landry, X.W. Chen, S.Y. Nof, A decision support methodology for dynamic [45] X. Zhang, S. Mahadevan, X. Deng, Reliability analysis with linguistic data: an evi-
taxiway and runway conflict prevention, Decision Support Systems 55 (1) (2013) dential network approach, Reliability Engineering & System Safety 162 (2017)
165–174. 111–121.
[14] C. Di Ciccio, H. Van der Aa, C. Cabanillas, J. Mendling, J. Prescher, Detecting flight [46] J. Lilleberg, Y. Zhu, Y. Zhang, Support vector machines and word2vec for text
trajectory anomalies and predicting diversions in freight transportation, Decision classification with semantic features, Cognitive Informatics & Cognitive Computing
Support Systems 88 (2016) 1–17. (ICCI* CC), 2015 IEEE 14th International Conference on, IEEE, 2015, pp. 136–140.
[15] S. Sankararaman, I. Roychoudhury, X. Zhang, K. Goebel, Preliminary Investigation [47] S. Zhu, X. Ji, W. Xu, Y. Gong, Multi-labelled classification using maximum entropy
of Impact of Technological Impairment on Trajectory-Based Operations, 17th AIAA method, Proceedings of the 28th Annual International ACM SIGIR Conference on
Aviation Technology, Integration, and Operations Conference, AIAA Aviation Research and Development in Information Retrieval, ACM, 2005, pp. 274–281.
Forum, Denver, Colorado, United States, 2017. [48] D. Isa, L.H. Lee, V. Kallimani, R. Rajkumar, Text document preprocessing with the
[16] H.-J. Shyur, A quantitative model for aviation safety risk assessment, Computers & Bayes formula for classification using the support vector machine, IEEE
Industrial Engineering 54 (1) (2008) 34–44. Transactions on Knowledge and Data Engineering 20 (9) (2008) 1264–1272.
[17] X. Chen, I. Bose, A.C.M. Leung, C. Guo, Assessing the severity of phishing attacks: a [49] S. Chakrabarti, S. Roy, M.V. Soundalgekar, Fast and accurate text classification via
hybrid data mining approach, Decision Support Systems 50 (4) (2011) 662–672. multiple linear discriminant projections, The VLDB Journal 12 (2) (2003) 170–185.
[18] K. Barker, J.E. Ramirez-Marquez, C.M. Rocco, Resilience-based network component [50] Natural Language Toolbox, https://www.nltk.org/, Accessed: 2018-04-02.
importance measures, Reliability Engineering & System Safety 117 (2013) 89–97. [51] K. Spärck Jones, IDF term weighting and IR research lessons, Journal of
[19] A. Roelen, R. Wever, A. Hale, L. Goossens, R. Cooke, R. Lopuhaä, M. Simons, Documentation 60 (5) (2004) 521–523.
P. Valk, Causal modeling for integrated safety at airports, Proceedings of ESREL, [52] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444.
2003, pp. 1321–1327. [53] J. Evermann, J.-R. Rehse, P. Fettke, Predicting process behaviour using deep
[20] D.A. Wiegmann, S.A. Shappell, A Human Error Approach to Aviation Accident learning, Decision Support Systems 100 (2017) 129–140.
Analysis: The Human Factors Analysis and Classification System, Routledge, 2017. [54] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep con-
[21] K.L. McFadden, E.R. Towell, Aviation human factors: a framework for the new volutional neural networks, Advances in Neural Information Processing Systems,
millennium, Journal of Air Transport Management 5 (4) (1999) 177–184. 2012, pp. 1097–1105.
[22] N. Oza, J.P. Castle, J. Stutz, Classification of aeronautics system health and safety [55] G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior,
documents, IEEE Transactions on Systems, Man, and Cybernetics, Part C V. Vanhoucke, P. Nguyen, T.N. Sainath, et al., Deep neural networks for acoustic
(Applications and Reviews) 39 (6) (2009) 670–680. modeling in speech recognition: the shared views of four research groups, IEEE
[23] S. Budalakoti, A.N. Srivastava, M.E. Otey, Anomaly detection and diagnosis algo- Signal Processing Magazine 29 (6) (2012) 82–97.
rithms for discrete symbol sequences with applications to airline safety, IEEE [56] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural
Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) language processing (almost) from scratch, Journal of Machine Learning Research
39 (1) (2009) 101–113. 12 (2011) 2493–2537.
[24] ASRS Program Briefing, https://asrs.arc.nasa.gov/docs/ASRS_ [57] M. Woźniak, M. Graña, E. Corchado, A survey of multiple classifier systems as
ProgramBriefing2016.pdf, Accessed: 2017-12-28. hybrid systems, Information Fusion 16 (2014) 3–17.
[25] Y. Liu, C. Jiang, H. Zhao, Using contextual features and multi-view ensemble [58] B. Zadrozny, C. Elkan, Transforming classifier scores into accurate multiclass
learning in product defect identification from online discussion forums, Decision probability estimates, Proceedings of the Eighth ACM SIGKDD International
Support Systems 105 (2018) 1–12. Conference on Knowledge Discovery and Data Mining, ACM, 2002, pp. 694–699.
[26] T. Joachims, Text categorization with support vector machines: learning with many [59] S. Sankararaman, S. Mahadevan, Model validation under epistemic uncertainty,
relevant features, European Conference on Machine Learning, Springer, 1998, pp. Reliability Engineering & System Safety 96 (9) (2011) 1232–1241.
137–142. [60] P. Ahlgren, C. Colliander, Document-document similarity approaches and science
[27] Y. Wang, W. Xu, Leveraging deep learning with LDA-based text analytics to detect mapping: experimental comparison of five approaches, Journal of Informetrics 3 (1)
automobile insurance fraud, Decision Support Systems 105 (2018) 87–95. (2009) 49–63.
[28] F. Wang, T. Xu, T. Tang, M. Zhou, H. Wang, Bilevel feature extraction-based text [61] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, International
mining for fault diagnosis of railway systems, IEEE Transactions on Intelligent Conference on Learning Representation, ACM, 2015, pp. 1–13.
Transportation Systems 18 (1) (2017) 49–58. [62] J.D. Rennie, N. Srebro, Loss functions for preference levels: regression with discrete
[29] R. Chen, Y. Zheng, W. Xu, M. Liu, J. Wang, Secondhand seller reputation in online ordered labels, Proceedings of the IJCAI multidisciplinary workshop on advances in
markets: a text analytics framework, Decision Support Systems 108 (2018) 96–106. preference handling, Kluwer, Norwell, MA, 2005, pp. 180–186.
[30] A.S. Abrahams, W. Fan, G.A. Wang, Z.J. Zhang, J. Jiao, An integrated text analytic [63] F. Pedregosa-Izquierdo, Feature Extraction and Supervised Learning on fMRI: From
framework for product defect discovery, Production and Operations Management Practice to Theory, Ph.D. thesis Université Pierre et Marie Curie-Paris VI, 2015.
24 (6) (2015) 975–990. [64] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms, Chapman and Hall/
[31] G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer, Using kNN model for automatic text CRC, 2012.
categorization, Soft Computing 10 (5) (2006) 423–430.
[32] M.-L. Zhang, Z.-H. Zhou, Multilabel neural networks with applications to functional Xiaoge Zhang received his B.S. degree from Chongqing University of Posts and
genomics and text categorization, IEEE Transactions on Knowledge and Data Telecommunications in 2011, and the M.S. degree from Southwest University in 2014,
Engineering 18 (10) (2006) 1338–1351. both in Chongqing, China. Currently, he is pursuing his Ph.D. degree in Vanderbilt
[33] M. Lan, C.L. Tan, J. Su, Y. Lu, Supervised and traditional term weighting methods University, and is expected to graduate in early 2019. From August to December in 2016,
for automatic text categorization, IEEE Transactions on Pattern Analysis and he interned at the National Aeronautics and Space Administration (NASA) Ames Research
Machine Intelligence 31 (4) (2009) 721–735. Center (ARC), Moffett Field, CA, working at the Prognostics Center of Excellence (PCoE)
[34] S. Tong, D. Koller, Support vector machine active learning with applications to text led by Dr. Kai Goebel. He was a recipient of the Chinese Government Award for
classification, Journal of Machine Learning Research 2 (Nov) (2001) 45–66. Outstanding Self-financed Students Abroad in 2017. He has published more than 30 re-
[35] V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998. search papers in peer-reviewed journals, such as IEEE Transactions on Cybernetics, IEEE
[36] N. Li, D.D. Wu, Using text mining and sentiment analysis for online forums hotspot Transactions on Reliability, Decision Support Systems, Information Sciences, Safety
detection and forecast, Decision Support Systems 48 (2) (2010) 354–368. Science, Annals of Operations Research, Reliability Engineering and System Safety, and
[37] Y. Tang, Y.-Q. Zhang, N.V. Chawla, S. Krasser, SVMs modeling for highly im- International Journal of Production Research, among others. Four of his publications are
balanced classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B included in the Essential Science Indicators (ESI) highly cited papers. His current research
39 (1) (2009) 281–288. interests include uncertainty quantification, reliability assessment, risk analysis, network
[38] W. Zhang, T. Du, J. Wang, Deep learning over multi-field categorical data, optimization, and data analytics. He is a student member of IEEE, INFORMS, and SIAM.
European Conference on Information Retrieval, Springer, 2016, pp. 45–57.
[39] Y. Shen, X. He, J. Gao, L. Deng, G. Mesnil, A latent semantic model with con-
volutional-pooling structure for information retrieval, Proceedings of the 23rd ACM Sankaran Mahadevan is John R. Murray Sr. Professor of Engineering, and Professor of
International Conference on Conference on Information and Knowledge Civil and Environmental Engineering at Vanderbilt University, Nashville, Tennessee,
Management, ACM, 2014, pp. 101–110. where he has served since 1988. His research interests are in the areas of uncertainty
[40] V. López, A. Fernández, S. García, V. Palade, F. Herrera, An insight into classifi- quantification, model verification and validation, reliability and risk analysis, design
optimization, and system health monitoring, with applications to civil, mechanical and
cation with imbalanced data: empirical results and current trends on using data
intrinsic characteristics, Information Sciences 250 (2013) 113–141. aerospace systems. His research has been extensively funded by NSF, NASA, FAA, DOE,
[41] J.F. Díez-Pastor, J.J. Rodríguez, C. García-Osorio, L.I. Kuncheva, Random balance: DOD, DOT, NIST, GE, GM, Chrysler, Union Pacific, American Railroad Association, and
ensembles of variable priors classifiers for imbalanced data, Knowledge-Based Sandia, Idaho, Los Alamos and Oak Ridge National Laboratories. His research contribu-
Systems 85 (2015) 96–111. tions are documented in more than 600 publications, including two textbooks on relia-
bility methods and 280 journal papers. He has directed 42 Ph.D. dissertations and 24 M.
[42] N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic minority
over-sampling technique, Journal of Artificial Intelligence Research 16 (2002) S. theses, and has taught several industry short courses on reliability and risk analysis
methods. He is a Fellow of AIAA and EMI (ASCE). His awards include the NASA Next
321–357.
[43] S. Tan, Y. Li, H. Sun, Z. Guan, X. Yan, J. Bu, C. Chen, X. He, Interpreting the public Generation Design Tools award (NASA), the SAE Distinguished Probabilistic Methods
sentiment variations on Twitter, IEEE Transactions on Knowledge and Data Educator Award, SEC Faculty Award, and best paper awards in the MORS Journal and the
Engineering 26 (5) (2014) 1158–1170. SDM and IMAC conferences. Professor Mahadevan obtained his B.S. from Indian Institute
[44] A. McCallum, K. Nigam, et al., A comparison of event models for Naïve Bayes text of Technology, Kanpur, M.S. from Rensselaer Polytechnic Institute, Troy, NY, and Ph.D.
from Georgia Institute of Technology, Atlanta, GA.
classification, AAAI-98 workshop on learning for text categorization, vol. 752,
63