Académique Documents
Professionnel Documents
Culture Documents
4.1 Notations and Definitions However, if a Bayesian network does not satisfy the faith-
Given a training data set D containing N training in- fulness, the Markov boundary of any node may not be
stances and M features, we define a distributed feature data unique [16]. We use Theorem 1 to explicitly construct and
set as that D is vertically divided into W feature blocks verify multiple Markov boundaries when the distribution P
without overlapping features between each block, that is, violates the faithful condition.
D = {B1 , B2 , , BW } which is distributed in W files.
Fi (1 i M ) is an feature within Bj (1 j W ) Theorem 1. [16] If M B1 is a Markov boundary of T
and C is the class attribute. Let P be a joint probability that contains a feature set S1 , and there exists a subset S2
distribution on a set of random variables F via a directed such that M B2 (M B1 S1 ) S2 , if I(T ; M B1 |M B2 ) = 0,
acyclic graph G. We call the triplet hF, G, P i a Bayesian then M B2 is also a Markov boundary of T .
network if hF, G, P i satisfies the Markov condition: every In Theorem 1, both M B1 and M B2 are Markov bound-
variable is conditionally independent of any subset of its aries of T , since S1 M B1 and S2 M B2 contain the
non-descendant variables given its parents [10]. equivalent information about T .
In the rest of the paper, the terms variable and feature
are used interchangeably. To measure the statistical rela- 4.2 The Core Feature Space
tionships between features, we adopt the measure of mutual According to the design of the MB-DEA algorithm, the
information [13]. Given two variables X and Y , the mutual key to the algorithm is how to discover the core feature space
information between X and Y is defined as follows. from distributed feature data. The core feature space we
I(X; Y ) = H(X) H(X|Y ) (1) are looking for is defined as a feature space that contains all
possible Markov boundaries of a target feature. By theorem
where H(Y ) and H(Y |Z) are computed as follows. 1, we get Corollary 1 below.
X
H(X) = (P (xi ) log2 (P (xi )) (2) Corollary 1. Assuming M B1 is a Markov boundary of
xi X T and M B2 {M B1 {Fi } {Fj }}, if I(C; Fj |Fi ) = 0,
M B2 is also a Markov boundary of T .
X X
H(X|Y ) = P (yj ) (P (xi |yj ) log2 (P (xi |yj )) (3) Definition 3. Assuming M B(C) is a Markov boundary
yj Y xi X of C and Fi M B(C), we define <(Fi ) as the set of features
From Equations (1) to (3), the conditional mutual informa- that satisfies the term Fj <(Fi ) s.t. I(C; Fj |Fi ) = 0.
tion is computed by By Corollary 1 and Definition 3, we define the core feature
I(X; Y |Z) space of C as follows.
= H(X|Z) H(X|Y Z)
X X X P (xj yk |zi ) Definition 4 (Core feature space). Assuming
= P (zi ) P (xj yk |zi ) log2 M B(C) is a Markov boundary of C and Fi M B(C), the
P (xj |zi )P (yk |zi )
zi Z xj X yk Y core feature space (CFS) of C is defined as
(4) [
The lower cases xi , yi , and zi in the above equations de- CF S(C) = {M B(C) <(Fi )}.
note possible values that the variables X, Y and Z take. Fi M B(C)
CFS(C) feature set? in are available? feature to CFS(C)
N
CFS(T)=MB(T)
Figure 1: An example of M B(T ) and CF S(T ) Figure 2: Discovery of the core feature space of C
from distributed feature data
To find the core feature space from distributed feature Algorithm 1: The MB-DEA Algorithm
data, we further analyze the relationships between features
Data: D = {B1 , B2 , , BW };
Fi and Fj in Corollaries 2 and 4 as follows.
M B(C) = {} ;CF S(C) = {} ;i = 0;
Corollary 2. A feature X is correlated to Y with re- = max(I(Y ; C), I(Fi ; C))
spect to a feature subset S if I(X; Y |S) > 0. 1 /*Steps 2-25: Mining the core feature space CF S*/
2 for j=1 to W do
Proof. By Eq.(4), we can see that the term I(X; Y |S) = 3 Get-new-feature-set(Bj );
0 holds only when X and Y are conditionally independent 4 repeat
given S. The corollary is proven accordingly. 5 i = i + 1;
6 Get-new-feature(Fi , Bj );
Corollary 3. If Fi is correlated to C and I(C; Fj |Fi ) = 7 if I(Fi ; C) = 0 then
0 holds, then I(Fi , C) > I(Fj , C). 8 Discard Fi ; goto step 23;
9 end
Proof. By I(A; B|C) I(A; B) = I(A; C|B) I(A; C), we 10 for each feature Y M B(C) do
get I(C; Fj |Fi ) I(C; Fj ) = I(C; Fi |Fj ) I(C; Fi ). Since 11 if I(Y ; C) > I(Fi ; C) and I(Fi ; Y ) then
the term I(C; Fj |Fi ) = 0 hold, then I(C; Fi ) = I(C; Fi |Fj )+ 12 <(Y ) = <(Y ) + Fi ;
I(C; Fj ). By Corollary 2, the corollary is proven. 13 < = < + Fi ;
14 Goto step 23;
Corollary 4. If both M B1 and M B2 (M B1 {Fi })
{Fj } are Markov boundaries of C and I(Fj ; C|Fi ) = 0 holds, 15 end
then I(Fi ; Fj ) max(I(Fi ; C), I(Fj ; C)). 16 if I(Fi ; C) > I(Y, C) and I(Fi ; Y ) then
17 M B(C) = M B(C) Y ;
Proof. 18 <(Fi ) = <(Fi ) + Y ;
19 < = < <(Y );
I(Fj ; Fi , C) = I(Fj ; Fi ) + I(Fj ; C|Fi ) 20 end
(5) 21 end
= I(Fj ; C) + I(Fj ; Fi |C)
22 M B(C) = M B(C) Fi ;
Since I(Fj ; C|Fi ) = 0 holds and Fi and Fj are correlated, 23 until Fi is the last feature of Bj ;
we can get I(Fi ; Fj ) I(Fj ; C). And at the same time, the 24 end
following equation also holds. 25 CF S(C) = M B(C) <
I(Fi ; Fj , C) = I(Fi ; Fj ) + I(Fi ; C|Fj ) 26 /*Steps 27-35: Mining Markov boundaries from CF S*/
(6) 27 Use HITION PC to learn a Markov boundary M B1 of
= I(Fi ; C) + I(Fi ; Fj |C)
C from D over CF S(C) (the original distribution);
By Corollary 1, we can also get I(Fi ; C|Fj ) = 0. By 28 Output M B1 ;
Eq.(6), we have I(Fi ; Fj ) I(Fi ; C). With the equations 29 repeat
(5) and (6), Corollary 4 is proved. 30 Generate a data set De from the embedded
By Corollaries 3 and 4, we can get the following. distribution by removing a subset of features within
the discovered Markov boundaries from CF S(C);
Observation 1. If I(Fi ; C) > I(Fj ; C) and I(Fi ; Fj )
31 Use HITION PC to learn a Markov boundary
max(I(Fi ; C), I(Fj ; C)), then Fi is in a Markov boundary of
M Bnew of C from De ;
C (M B(C)), and Fj is included by <(Fi ).
32 if acc(M Bnew ) acc(M B1 ) then
4.3 Discovery of Multiple Markov Bound- 33 output M Bnew ;
34 end
aries from Distributed Feature Data 35 until all data sets De generated have been considered ;
Figure 2 gives the new framework to efficiently find the
core feature space of C from distributed feature data. In
Figure 2, each feature block Bi is processed independently
at a time, and as a feature block Bi arrives, features in Bi relatively low computational complexity to deal with high-
are processed one-by-one in a sequential scan. dimensional yet distributed tornado data, according to Ob-
As illustrated in Figure 2, we discuss the pseudocode servation 1, Corollaries 3 and 4, the MB-DEA algorithm
of the MB-DEA algorithm in Algorithm 1. To achieve a uses online pairwise comparisons as the selection criterion
for adding features into the core feature space.
In Algorithm 1, <(Fi ) keeps the set of features correlated Table 2: Summary of the benchmark data sets
to Fi that satisfies Definition 3; < dynamically keeps the Data set ] features ] training set ] testing set
1 arcene 10,000 100 100
features in |M
S B(C)|
i=1 <(Fi ); M B(C) is defined in Definition
2 dexter 20,000 300 300
4; and CF S(C) denotes the core feature space of C. Two key
3 dorothea 100,000 800 300
stages are given in Algorithm 1. One is to mine CF S(C)
4 colon 2,000 42 20
from Steps 2 to 25, and the other is to discover Markov
5 leukemia 7,129 48 24
boundaries from CF S(C) from Steps 27 to 35.
6 lung-cancer 12,533 121 60
Steps 2 to 25. As a feature set Bj is achieved, features
7 ovarian-cancer 2,190 144 72
within Bj are processed one-by-one in a sequential scan.
8 thrombin 139,351 2,000 543
Step 7 is to determine whether a new coming feature Fi
is correlated to C; if I(Fi ; C) > 0, then we consider Fi is
correlated to C; otherwise, Fi is discarded.
If Fi is correlated to C, Step 11 determines whether to
keep Fi given the current set M B(C). If not, Fi is added to We have chosen eight benchmark data sets as described
< and <(Y ), respectively. If Fi can be added to M B(C) at in Table 2, which cover a wide range of real-world appli-
Step 11, Step 16 further prunes M B(C) due to Fi s inclu- cation domains. In Table 2, for the first four NIPS 2003
sion. If Y in M B(C) is removed, the set <(Y ) is removed challenge data sets and the hiva data set from the WCCI
from < accordingly. Meanwhile, Y is added to <(Fi ). With 2006 performance prediction challenge, we use the originally
the MB-DEA algorithm, we can discover < without any ad- provided training and validation sets; for the other five data
ditional computation costs. sets we adopt the first 2/3 instances for training and the
Steps 27 to 35. We integrate the TIE* algorithm last 1/3 instances for testing. As for discrete data sets, we
into our MB-DEA algorithm to mine Markov boundaries use mutual information to compute the correlation between
from the discovered core feature space. The main idea of features while for the continuous data sets, the Fishers Z-
TIE* [16] is to first identify a Markov boundary of a target test is employed [2] in which the significance level for the
feature T in the original data distribution and then itera- Fishers Z-test is set to 0.01. All experiments on the bench-
tively run a single Markov boundary induction algorithm mark data sets are conducted on a computer with Intel(R)
from the embedded distributions that are obtained by re- i7-3770 3.4GHz CPU and 32GB memory.
moving subsets of features from the original Markov bound- In our experiments, each training data set in Table 2
ary in order to identify new Markov boundaries in the origi- is partitioned into ten feature sets evenly to simulate dis-
nal distribution. From Steps 27 to 35, with the core feature tributed feature data for MB-DEA while TIE* is directly
space, the TIE* algorithm can search for multiple Markov implemented on each training data set. TIE* is parame-
boundaries in a smaller feature space. terized with HITON PC [1] as the base Markov blanket in-
Step 27 uses a single Markov boundary induction algo- duction algorithm, and classification accuracy as a criterion
rithm HITION PC [1] to learn a Markov boundary, called is used to verify whether a new feature subset is a Markov
M B1 , from the data set D defined on the feature set boundary or not. The parameter alpha of HITION PC is
CF S(C) (i.e., in the original distribution). Step 30 gener- set to 0.01.
ates a data set De (the embedded distribution) that removes In the following figures and tables, we report the compari-
a subset of features from CF S (Regarding how to generate son results of MB-DEA and TIE*. The running time on the
an embedded distribution, please see the IGS algorithm in dexter and thrombin data sets exceed 3 days for TIE* and
[16], Page18, Figure 9). The motivation is that De may lead we do not show the experimental results of TIE* on those
to identification of a new Markov boundary of T that was two data sets.
previously invisible to a single Markov boundary induc- Table 3 reports the F-ratio and MB-ratio. An F-ratio is
tion algorithm, because it is shielded by another subset of the ratio of the number of features of the core feature space
features within the discovered Markov boundaries. Step 32 discovered by MB-DEA divided by the number of total fea-
uses prediction accuracy as a criterion to verify whether a tures on the same data set. An MB-ratio is the ratio of
discovered feature set from the embedded distribution is a the number of Markov boundaries discovered by MB-DEA
new Markov boundary or not. If the prediction accuracy of in the number of Markov boundaries identified by TIE*. In
M Bnew in Step 32 is not less than that of M B1 , M Bnew Table 3, we can see that on each data set, the number of
is also considered as a Markov boundary of C. Steps 30- features within the core feature space is a very small frac-
34 are repeated until all data sets De generated have been tion of the total number of features. Furthermore, from the
considered. MB-ration, except for the madelon and thrombin, MB-DEA
discovers the same number of Markov boundaries as TIE* on
5. EMPIRICAL STUDY the remaining data sets. For the dexter and thrombin data
In this empirical study, since the state-of-the-art multi- sets, we do not have their MB-ratios due to long running
ple Markov boundary discovery algorithm, the TIE* algo- time of TIE* and denote them as -.
rithm [16], cannot deal with the high-dimensional yet dis- Meanwhile, Figure 3 gives prediction (classification) accu-
tributed tornado data set, we will first validate the effec- racies of MB-DEA against TIE* using the K Nearest Neigh-
tiveness and efficiency of the MB-DEA algorithm against bor classifier. We select the highest prediction accuracy
the TIE* algorithm using the benchmark data sets, and then among all of the Markov boundaries discovered by MB-DEA
use the MB-DEA algorithm for tornado prediction. and TIE*, respectively. From Figure 3, we can see that with
the core feature space, MB-DEA gets the same prediction
5.1 Experiments on the Benchmark Data Sets accuracy as TIE*. Although MB-DEA does not find all of
Table 3: The F-ratio and MB-ratio Table 4: Efficiency of MB-DEA vs.TIE*(seconds)
Data set F-ratio MB-ratio Data set TIE* MB-DEA
arcene 0.1157 1 arcene 19 6
dexter 0.0057 - dexter - 17,091
dorothea 0.0636 1 dorothea 6,465 25
colon 0.033 0.5 colon 5 1
leukemia 0.0537 1 leukemia 15 6
lung-cancer 0.0.0608 1 lung-cancer 79 16
thrombin 0.0249 - thrombin - 24,935
ovarian-cancer 0.1685 1 ovarian-cancer 6 3
1
MBDEA the core feature space that the MB-DEA algorithm has dis-
0.9 TIE*
0.8
covered is only a small fraction of the entire feature space,
but MB-DEA gets a very promising prediction accuracy with
Prediction accuracy
0.7
0
arcene dexter dorothea colon leukemia lungcancer thrombin ovariancancer 5.2.1 Experiment Setup
Data set
Tornadoes are rare events even during the most frequent
months. The positive and negative instances ratio in our
Figure 3: Prediction (classification) accuracies of study is about 1:30. We use both accuracy (Equation 7)
MB-DEA against TIE* and F1 score (Equation 8) as the performance metrics in
our empirical study.
Markov boundaries on the colon data set, it still gets the TP + FP
Markov boundaries with the highest prediction accuracies. Accuracy = (7)
TP + TN + FP + FN
Figure 4 gives the running time of the discovery of the
core feature space from Steps 2 to 25 in Algorithm 1. From
2T P
Figure 4, we can see that the computational cost of the dis- F1 = (8)
covery of the core feature space is very low due to identifying 2T P + F P + F N
the < feature set (see Definition 3) without additional time where T P is the number of true positives, FP is the number
costs. Table 4 gives the running time of MB-DEA against of false positives, TN is the number of true negatives, and
TIE*. From Table 4, we can see that in term of efficiency, FN is the number of false negatives.
MB-DEA is faster than TIE* on all of ten data sets. From We divide the data set into two sets, 30 years (1979-2008,
the dorothea, dexter, and thrombin data sets in Table 4, 2, 610 days with 81 positive instances) as the training set
with the core feature space, MB-DEA can efficiently deal and 5 years (2009-2013, 435 days with 17 positive instances)
with data with very high-dimensionality while TIE* is very as the test set. As illustrated in Figure 5, we randomly
expensive, or even impossible due to its exhaustive search under sample the negative instances to achieve class ratios
over the entire feature space. of positive: negative (P/N) examples that are equal to 1:5,
In summary, with the above results, we can conclude that 1:10, 1:20, and 1:25 respectively. At each ratio level we per-
form the random process 10 times and run the MB-DEA
algorithm with the correspondent training data set to iden-
35
tify tornado precursors from multiple Markov boundaries.
30 We apply the tornado precursors to the K Nearest Neigh-
Running time (seconds)
25 bor classifier with K=1 on the test set. The average and
best prediction results are reported in Table 5, in which the
20
average values are calculated using the best precursors iden-
15 tified from each training data set. We plot the experimental
10 results in Figures 6 to 7.
5
5.2.2 Results and Discussion
0
1 2 3 4 5 6 7 8 Our goal is to identify key interpretable tornado precur-
Data sets
sors that can latterly be used in forecasting models. As
discussed earlier, we start from a single Markov boundary,
Figure 4: Running time of the discovery of the core then derive a core feature set from it, eventually identify
feature space. The labels of the x-axis in both figures the best tornado precursors. The average F1 values of the
from 1 to 8 denote the data sets:1.arcene, 2.dex- best tornado precursor sets are much higher than those of
ter, 3.dorothea, 4.colon, 5.leukemia, 6.lung-cancer, the initial single Markov boundaries at all the sampling lev-
7.thrombin, 8.ovarian-cancer. els (Table 5). The best average forecasting result (average
Table 5: The average and best prediction results
P/N Ratio Ave. F1 of M B(C) Ave. Size of CFS(C) Ave. Acc. Ave. F1 Best Acc. Best F1
1:5 0.1041 269 0.9097 0.1907 0.9241 0.2667
1:10 0.1333 217 0.9218 0.2440 0.9253 0.2917
1:15 0.1579 233 0.9296 0.2688 0.9448 0.2941
1:20 0.1739 243 0.9255 0.3129 0.9310 0.3478
1:25 0.1778 245 0.9264 0.3018 0.9402 0.3500
Without Undersampling 0.1429 238 0.9218 0.2778
Under Sampling with MB_DEA Algorithm As an algorithm of discovering multiple Markov bound-
Different P/N Ratios
MB
aries, MB-DEA is not limited to finding the feature set with
P/N
MB(C) CFS(C)
MB1
MB1
the best prediction result. MB-DEA is able to report differ-
=1:5 Best
MB
MB ent precursor sets with similar prediction powers according
MB
to other domain interests. For example, compared to other
Tornado
Data Set
P/N
=1:10
... variables in the tornado data set, the Pressure Vertical Ve-
locity is considered as a noisy and unreliable variable. In
the feature set with the best prediction result (Figure 6f, b-
... Finding Single
Markov
Generating
Core Feature
Mining Multiple
Markov set for short), we have two features (No.7 and No.12) from
Boundary Set Boundaries the field of Pressure Vertical Velocity. From all the MB-
DEA outputs generated from the same input data set, we
Figure 5: Flow chart of the experimental process. are able to find a feature set whose features are all from
Using an undersampled data set as input, MB-DEA fields other than Pressure Vertical Velocity (Figure 7, we
outputs the multiple Markov Boundaries and a best call it the most reliable set, or m-set for short.) and have
Markov Boundary, or the tornado precursor set, can similar prediction power (F1=0.3265). From Figure 6f and
be found according to the interesting measures. Figure 7, we find that more than half of the features in the
two sets (No.1,2,3,4,8,10,and 11) are exactly the same, and
others (except No.6) are from the same spatial clusters in the
core feature set. In the m-set instead of Pressure Vertical Ve-
accuracy=0.93, average F1=0.31) is achieved at the under- locity, variables from Relative Humidity are selected (No.5
sampling level of P/N ratio 1:20 and the best predicting and No.7 in m-set, compared to No.12 and No.7 in b-set, re-
result (accuracy=0.94, F1=0.35) is achieved with P/N ratio spectively). MB-DEA provides a new approach of bridging
of 1:25, in which we have successfully predicted 7 tornado data-driven and knowledge-driven processes for better data
events one day ahead out of 17 during the testing period interpretation to be used in various forecasting models.
(MAM 2009-2013, 435 days), with 16 false positives. To the
best of our knowledge, our result among the most promising
tornado forecasting results at daily level compared to the 6. CONCLUSION
state-of-the-art algorithms [3]. The development of reliable tornado forecasting tech-
Figure 6 uses our empirical results to explain how MB- niques able to provide warnings at a long lead-time is cru-
DEA works. We plot the single Markov Boundary (M B(C) cial to help human beings better prepare and respond to
in Algorithm 1), the core feature set CF S(C), and the tor- disastrous events. In this paper, we propose the MB-DEA
nado precursors (the best Markov boundary according to algorithm to identify multiple sets of the precursors for reli-
the criteria of prediction powers) with respect to the ex- able tornado forecasting using multiple Markov boundaries.
periment having best prediction results (F1=0.35) in Figure Our empirical study reveals that the precursors identified
6. Firstly, the single Markov boundary M B(C) is learned by our algorithm are considered more reliable and practical
from the tornado data set (Figure 6a), then the core feature than the ones identified by the single Markov boundary dis-
set CF S(C) is built based on the M B(C) (Figure 6b-6e); covery algorithm, and lead to advancements in reliable and
finally, the best Markov boundary according to prediction long-lead time of catastrophic tornado forecasting.
powers, is reported (Figure 6f). Different variables in the
core feature set CF S(C) fall into clusters in the spatial do-
mains (blue circles in Figure 6b-6e). The arrows from Figure 7. ACKNOWLEDGMENTS
6a to Figure 6d show that how a feature from the field of This work is partly supported by a PIMS Post-Doctoral
Pressure Vertical Velocity helps to generate a cluster of re- Fellowship Award of the Pacific Institute for the Math-
lated features (mostly from the field of Pressure Vertical Ve- ematical Sciences, Canada, the Program for Changjiang
locity and Relative Humidity), and later contributes to the Scholars and Innovative Research Team in University (PC-
identification of precursor 12 (Figure 6f). The variables from SIRT) of the Ministry of Education, China (under grant
the same field closed to each other in the spatial domains IRT13059), the National 973 Program of China (under grant
will tend to have similar values due to the spatial autocor- 2013CB329604), the National Natural Science Foundation
relation effect [20]. These spatial clusters can be considered of China (under grants 61229301 and 61305064), an NSERC
as real-world examples of the CF S(C) illustrated in Figure Discovery grant and a BCIC NRAS Team Project. All opin-
2. Our algorithm picks individual precursors out of each ions, findings, conclusions and recommendations in this pa-
cluster and successfully finds the best combination (Markov per are those of the authors and do not necessarily reflect
Boundary) according to the forecasting task. the views of the funding agencies.
a. Single Markov Boundary
(MB(C), F1=0.1905)
Figure 6: A real world example on the processes of how MB-DEA works. MB-DEA firstly discovers a single
Markov Boundary (Figure (a): M B(C) from Algorithm 1). The partial CF S (Figures (b) to (e)) shows feature
clusters in the spatial domain built based on features from M B(C), and finally the best feature set (b-set
(Figure (f )), according to the prediction power) is found from CF S. The red square in each map is the target
area for tornado labeling.
Figure 7: The most reliable feature set (according to the types of variables in the set). The red square in the
map is the target area for tornado labeling.
8. REFERENCES 1988.
[1] C. F. Aliferis, A. Statnikov, I. Tsamardinos, S. Mani, [11] J. M. Pe na, R. Nilsson, J. Bj
orkegren, and J. Tegner.
and X. D. Koutsoukos. Local causal and markov Towards scalable and data efficient learning of markov
blanket induction for causal discovery and feature boundaries. International Journal of Approximate
selection for classification part i: Algorithms and Reasoning, 45(2):211232, 2007.
empirical evaluation. Journal of Machine Learning [12] C. M. Shafer, A. E. Mercer, L. M. Leslie, M. B.
Research, 11:171234, 2010. Richman, and C. A. Doswell III. Evaluation of wrf
[2] T. W. Anderson, T. W. Anderson, T. W. Anderson, model simulations of tornadic and nontornadic
and T. W. Anderson. An introduction to multivariate outbreaks occurring in the spring and fall. Monthly
statistical analysis, volume 2. Wiley New York, 1958. Weather Review, 138(11):40984119, 2010.
[3] J. Brotzge and W. Donner. The tornado warning [13] C. E. Shannon. A mathematical theory of
process: A review of current research, challenges, and communication. ACM SIGMOBILE Mobile Computing
opportunities. Bulletin of the American Meteorological and Communications Review, 5(1):355, 2001.
Society, 94(11):17151733, 2013. [14] K. Simmons and D. Sutter. Economic and societal
[4] A. J. Clark, J. S. Kain, P. T. Marsh, J. Correia Jr, impacts of tornadoes. Springer Science & Business
M. Xue, and F. Kong. Forecasting tornado pathlengths Media, 2013.
using a three-dimensional object identification [15] N. Snook and M. Xue. Effects of microphysical drop
algorithm applied to convection-allowing forecasts. size distribution on tornadogenesis in supercell
Weather and Forecasting, 27(5):10901113, 2012. thunderstorms. Geophysical Research Letters, 35(24),
[5] C. P. De Campos and Q. Ji. Efficient structure 2008.
learning of bayesian networks using constraints. [16] A. Statnikov, J. Lemeir, and C. F. Aliferis. Algorithms
Journal of Machine Learning Research, 12:663689, for discovery of multiple markov boundaries. Journal
2011. of Machine Learning Research, 14(1):499566, 2013.
[6] C. A. Doswell, H. E. Brooks, and N. Dotzek. On the [17] I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The
implementation of the enhanced fujita scale in the usa. max-min hill-climbing bayesian network structure
Atmospheric Research, 93(1):554563, 2009. learning algorithm. Machine learning, 65(1):3178,
[7] J. H. Faghmous and V. Kumar. A big data guide to 2006.
understanding climate change: The case for [18] X. Wu, K. Yu, W. Ding, H. Wang, and X. Zhu. Online
theory-guided data science. Big data, 2(3):155163, feature selection with streaming features. Pattern
2014. Analysis and Machine Intelligence, IEEE Transactions
[8] I. M. Held, R. T. Pierrehumbert, S. T. Garner, and on, 35(5):11781192, 2013.
K. L. Swanson. Surface quasi-geostrophic dynamics. [19] K. Yu, X. Wu, Z. Zhang, Y. Mu, H. Wang, and
Journal of Fluid Mechanics, 282:120, 1995. W. Ding. Markov blanket feature selection with
[9] E. Kalnay, M. Kanamitsu, R. Kistler, W. Collins, non-faithful data distributions. In In IEEE ICDM13,
D. Deaven, L. Gandin, M. Iredell, S. Saha, G. White, pages 857866. IEEE, 2013.
J. Woollen, et al. The ncep/ncar 40-year reanalysis [20] P. Zhang, Y. Huang, S. Shekhar, and V. Kumar.
project. Bulletin of the American meteorological Exploiting spatial autocorrelation to efficiently process
Society, 77(3):437471, 1996. correlation-based similarity queries. In Advances in
[10] J. Pearl. Probabilistic reasoning in intelligent systems: Spatial and Temporal Databases, pages 449468. 2003.
networks of plausible inference. Morgan Kaufmann,