Vous êtes sur la page 1sur 784

Studies in Computational Intelligence 808

Vladik Kreinovich
Songsak Sriboonchitta Editors

Structural
Changes and
their Econometric
Modeling
Studies in Computational Intelligence

Volume 808

Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: kacprzyk@ibspan.waw.pl
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.

More information about this series at http://www.springer.com/series/7092


Vladik Kreinovich Songsak Sriboonchitta

Editors

Structural Changes and their


Econometric Modeling

123
Editors
Vladik Kreinovich Songsak Sriboonchitta
Department of Computer Science Faculty of Economics
University of Texas at El Paso Chiang Mai University
El Paso, TX, USA Chiang Mai, Thailand

ISSN 1860-949X ISSN 1860-9503 (electronic)


Studies in Computational Intelligence
ISBN 978-3-030-04262-2 ISBN 978-3-030-04263-9 (eBook)
https://doi.org/10.1007/978-3-030-04263-9

Library of Congress Control Number: 2018960914

© Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Contents

General Theory
The Replacement for Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . 3
William M. Briggs, Hung T. Nguyen, and David Trafimow
On Quantum Probability Calculus for Modeling
Economic Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Hung T. Nguyen, Songsak Sriboonchitta, and Nguyen Ngoc Thach
My Ban on Null Hypothesis Significance Testing
and Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
David Trafimow
Kalman Filter and Structural Change Revisited: An Application
to Foreign Trade-Economic Growth Nexus . . . . . . . . . . . . . . . . . . . . . . 49
Omorogbe Joseph Asemota
Statisticians Should Not Tell Scientists What to Think . . . . . . . . . . . . . . 63
Donald Bamber
Bayesian Modelling Structural Changes on Housing Price Dynamics . . . 83
Hong Than-Thi, Manh Cuong Dong, and Cathy W. S. Chen
Cumulative Residual Entropy-Based Goodness of Fit Test
for Location-Scale Time Series Model . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Sangyeol Lee
The Quantum Formalism in Social Science: A Brief Excursion . . . . . . . 116
Emmanuel Haven
How Annualized Wavelet Trading “Beats” the Market . . . . . . . . . . . . . 124
Lanh Tran
Flexible Constructions for Bivariate Copulas Emphasizing
Local Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Xiaonan Zhu, Qingsong Shan, Suttisak Wisadwongsa, and Tonghui Wang

v
vi Contents

Desired Sample Size for Estimating the Skewness Under Skew


Normal Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Cong Wang, Tonghui Wang, David Trafimow, and Hunter A. Myüz
Why the Best Predictive Models Are Often Different from the Best
Explanatory Models: A Theoretical Explanation . . . . . . . . . . . . . . . . . . 163
Songsak Sriboonchitta, Luc Longpré, Vladik Kreinovich,
and Thongchai Dumrongpokaphan
Algorithmic Need for Subcopulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Thach Ngoc Nguyen, Olga Kosheleva, Vladik Kreinovich,
and Hoang Phuong Nguyen
How to Take Expert Uncertainty into Account: Economic Approach
Illustrated by Pavement Engineering Applications . . . . . . . . . . . . . . . . . 182
Edgar Daniel Rodriguez Velasquez, Carlos M. Chang Albitres,
Thach Ngoc Nguyen, Olga Kosheleva, and Vladik Kreinovich
Quantum Approach Explains the Need for Expert Knowledge:
On the Example of Econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Songsak Sriboonchitta, Hung T. Nguyen, Olga Kosheleva,
Vladik Kreinovich, and Thach Ngoc Nguyen

Applications
Monetary Policy Shocks and Macroeconomic Variables: Evidence
from Thailand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Popkarn Arwatchanakarn
Thailand’s Household Income Inequality Revisited: Evidence
from Decomposition Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Natthaphat Kingnetr, Supanika Leurcharusmee, and Songsak Sriboonchitta
Simultaneous Confidence Intervals for All Differences of Variances
of Log-Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Warisa Thangjai and Suparat Niwitpong
Confidence Intervals for the Inverse Mean and Difference of Inverse
Means of Normal Distributions with Unknown Coefficients
of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Warisa Thangjai, Sa-Aat Niwitpong, and Suparat Niwitpong
Confidence Intervals for the Mean of Delta-Lognormal Distribution . . . 264
Patcharee Maneerat, Sa-Aat Niwitpong, and Suparat Niwitpong
The Interaction Between Fiscal Policy, Macroprudential Policy
and Financial Stability in Vietnam-An Application of Structural
Equation Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Nguyen Ngoc Thach, Tran Thi Kim Oanh, and Huynh Ngoc Chuong
Contents vii

Using Confirmation Factor Analysis to Construct a Financial Stability


Index for Vietnam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Nguyen Ngoc Thach, Tran Thi Kim Oanh, and Huynh Ngoc Chuong
Mercury Retrograde and Stock Market Returns in Vietnam . . . . . . . . . 303
Nguyen Ngoc Thach and Nguyen Van Diep
Modeling Persistent and Periodic Weekly Rainfall in an Environment
of an Emerging Sri Lankan Economy . . . . . . . . . . . . . . . . . . . . . . . . . . 314
H. P. T. N. Silva, G. S. Dissanayake, and T. S. G. Peiris
Value at Risk of SET Returns Based on Bayesian Markov-Switching
GARCH Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Petchaluck Boonyakunakorn, Pathairat Pastpipatkul,
and Songsak Sriboonchitta
Benfordness of Chains of Truncated Beta Distributions via a Piecewise
Constant Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Tippawan Santiwipanont, Songkiat Sumetkijakan,
and Teerapot Wiriyakraikul
Confidence Intervals for Coefficient of Variation of Three
Parameters Delta-Lognormal Distribution . . . . . . . . . . . . . . . . . . . . . . . 352
Noppadon Yosboonruang, Suparat Niwitpong, and Sa-Aat Niwitpong
Confidence Intervals for Difference Between Means and Ratio
of Means of Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
Manussaya La-ongkaew, Sa-Aat Niwitpong, and Suparat Niwitpong
Trading Signal Analysis with Pairs Trading Strategy in the Stock
Exchange of Thailand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Natnarong Namwong, Woraphon Yamaka, and Roengchai Tansuchat
Technical Efficiency Analysis of Tourism and Logistics in ASEAN:
Comparing Bootstrapping DEA and Stochastic Frontier Analysis
Based Decision on Copula Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Chanamart Intapan, Songsak Sriboonchitta, Chukiat Chaiboonsri,
and Pairach Piboonrungroj
Estimating the Difference in the Percentiles of Two Delta-Lognormal
Independent Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
Maneerat Jaithun, Sa-Aat Niwitpong, and Suparat Niwitpong
Impacts of Global Market Volatility and US Dollar on Agricultural
Commodity Futures Prices: A Panel Cointegration Approach . . . . . . . . 412
Khunanont Lerkeitthamrong, Chatchai Khiewngamdee,
and Rossarin Osathanunkul
viii Contents

An Analysis of the Impact of the Digital Economy on Change


in Thailand’s Economic Trends Using Dynamic Stochastic
General Equilibrium (DSGE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Chaiwat Klinlampu, Chukiat Chaiboonsri, Anuphak Saosaovaphak,
and Jirakom Sirisrisakulchai
A Regime Switching Skew-Distribution Model of Contagion . . . . . . . . . 439
Woraphon Yamaka, Payap Tarkhamtham, Paravee Maneejuk,
and Songsak Sriboonchitta
Structural Breaks Dependence Analysis of Oil, Natural Gas,
and Heating Oil: A Vine-Copula Approach . . . . . . . . . . . . . . . . . . . . . . 451
Nopasit Chakpitak, Payap Tarkhamtham, Woraphon Yamaka,
and Songsak Sriboonchitta
Markov Switching Constant Conditional Correlation GARCH
Models for Hedging on Gold and Crude Oil . . . . . . . . . . . . . . . . . . . . . 463
Noppasit Chakpitak, Pichayakone Rakpho, and Woraphon Yamaka
Portfolio Optimization of Stock, Oil and Gold Returns: A Mixed
Copula-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
Sukrit Thongkairat, Woraphon Yamaka, and Nopasit Chakpitak
Markov Switching Quantile Model Unknown tau Energy Stocks Price
Index Thailand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
Pichayakone Rakpho, Woraphon Yamaka, and Songsak Sriboonchitta
Modeling the Dependence Dynamics and Risk Spillovers for G7
Stock Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Noppasit Chakpitak, Rungrapee Phadkantha, and Woraphon Yamaka
A Regime Switching Vector Error Correction Model of Analysis
of Cointegration in Oil, Gold, Stock Markets . . . . . . . . . . . . . . . . . . . . . 514
Sukrit Thongkairat, Woraphon Yamaka, and Songsak Sriboonchitta
A Regime Switching Time-Varying Copula Approach to Oil and Stock
Markets Dependence: The Case of G7 Economies . . . . . . . . . . . . . . . . . 525
Rungrapee Phadkantha, Woraphon Yamaka, and Songsak Sriboonchitta
Forecasting Exchange Rate with Linear and Non-linear
Vector Autoregressive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Rungrapee Phadkantha, Woraphon Yamaka, and Songsak Sriboonchitta
The Impacts of Macroeconomic Variables on Economic Growth:
Evidence from China, Japan, and South Korea . . . . . . . . . . . . . . . . . . . 552
Wilawan Srichaikul, Woraphon Yamaka, and Songsak Sriboonchitta
Contents ix

Determinants of Foreign Direct Investment Inflow in ASEAN


Countries: Panel Threshold Approach and Panel Smooth Transition
Regression Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
Noppasit Chakpitak, Wilawan Srichaikul, Woraphon Yamaka,
and Songsak Sriboonchitta
Predictive Recursion Maximum Likelihood for Kink
Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
Noppasit Chakpitak, Woraphon Yamaka, and Paravee Maneejuk
Bayesian Extreme Value Optimization Algorithm: Application
to Forecast the Rubber Futures in Futures Exchange Markets . . . . . . . 582
Arisara Romyen, Satawat Wannapan, and Chukiat Chaiboonsri
Measuring U.S. Business Cycle Using Markov-Switching Model:
A Comparison Between Empirical Likelihood Estimation and
Parametric Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
Paravee Maneejuk, Woraphon Yamaka, and Songsak Sriboonchitta
Analysis of Small and Medium-Sized Enterprises’ Insolvency
Probability by Financial Statements Using Probit Kink Model:
Manufacture Sector in Songkhla Province, Thailand . . . . . . . . . . . . . . . 607
Chalerm Jaitang, Paravee Maneejuk, Aree Wiboonpongse,
and Songsak Sriboonchitta
Frequency Domain Causality Analysis of Stock Market and Economic
Activites in Vietnam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
Nguyen Ngoc Thach, Le Hoang Anh, and Ha Thi Nhu Phuong
Investigating Structural Dependence in Natural Rubber Supplys
Based on Entropy Analyses and Copulas . . . . . . . . . . . . . . . . . . . . . . . . 639
Kewalin Somboon, Chukiat Chaiboonsri, Satawat Wannapan,
and Songsak Sriboonchitta
The Dependence Between International Crude Oil Price and Vietnam
Stock Market: Nonlinear Cointegration Test Approach . . . . . . . . . . . . . 648
Le Hoang Anh, Tran Phuoc, and Ha Thi Nhu Phuong
Stability of Vietnam Money Demand Function: An Empirical
Application of Multiple Testing with a Structural Break . . . . . . . . . . . . 670
Bui Quang Hien and Pham Dinh Long
Analytic on Long-Run Equilibrium Between Thailand’s Economy
and Business Tourism (MICE) Industry Using Bayesian Inference . . . . 684
Chanamart Intapan, Songsak Sriboonchitta, Chukiat Chaiboonsri,
and Pairach Piboonrungroj
x Contents

Technical Efficiency Analysis of Top Agriculture Producing Countries


in Asia: Zero Inefficiency Meta-Frontier Approach . . . . . . . . . . . . . . . . 702
Jianxu Liu, Hui Li, Songsak Sriboonchitta, and Sanzidur Rahman
Technical Efficiency Analysis of Agricultural Production of BRIC
Countries and the United States of America: A Copula-Based
Meta-Frontier Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
Jianxu Liu, Yangnan Cheng, Sanzidur Rahman, and Songsak Sriboonchitta
Comparisons of Confidence Interval for a Ratio of Non-normal
Variances Using a Kurtosis Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 745
Channarong Wongyai and Sirima Suwan
An Analysis of Stock Market Cycle with Markov Switching
and Kink Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756
Konnika Palason and Roengchai Tansuchat
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
General Theory
The Replacement for Hypothesis Testing

William M. Briggs1(B) , Hung T. Nguyen2,3 , and David Trafimow2


1
New York, USA
matt@wmbriggs.com
2
New Mexico State University, Las Cruces, USA
{hunguyen,dtrafimo}@nmsu.edu
3
Chiang Mai University, Chiang Mai, Thailand

Abstract. Classical hypothesis testing, whether with p-values or Bayes


factors, leads to over-certainty, and produces the false idea that causes
have been identified via statistical methods. The limitations and abuses
of in particular p-values are so well known and by now so egregious,
that a new method is badly in need. We propose returning to an old
idea, making direct predictions by models of observables, assessing the
value of evidence by the change in predictive ability, and then verify-
ing the predictions against reality. The latter step is badly in need of
implementation.

Keywords: P-values · Hypothesis testing


Model selection · Model validation · Predictive probability

1 The Nature of Testing

The plain meaning of hypothesis testing is to ascertain whether, or to what


degree, certain hypotheses are true or false, or if a theory is good or bad, or
useful or not. This is not, of course, what that phrase means in frequentist or
Bayesian theory. Classical statistical philosophy has developed measures, such as
p-values and Bayes factors, which are not directly related to the plain meaning.
Yet the plain meaning is what all seek to know.
The relationship between a theory’s truth or goodness and p-values is non-
existent by design. The connection between a theory’s truth and Bayes factors is
more natural, e.g. Mulder and Wagenmakers (2016), but because Bayes factors
focus on unobservable parameters, they exaggerate evidence for or against a
theory (we demonstrate this presently). The predictive approach outlined below
restores, and puts into proper perspective, the natural goals of modeling.
The two main goals of modeling physical observables are prediction and
explanation, i.e. understanding the causes of the phenomenon of interest. With-
out delving too deeply into a highly complex subject, it should be obvious that
if we knew the cause or causes of an observable, we would write these down and
not need a probability model, see Briggs (2016). Probability models are only
needed when causes are unknown, at least in some degree. Though there is some
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 3–17, 2019.
https://doi.org/10.1007/978-3-030-04263-9_1
4 W. M. Briggs et al.

disagreement on the topic, e.g. Hitchcock (2016), Breiman (2001), and though
the reader need not agree here, we suggest that there is no ability for a wholly
statistical model to identify cause. Everybody agrees models can, and do, find
correlations. And because correlations are not causes, hypothesis testing cannot
find causes, nor does it claim to. At best, hypothesis testing highlights possibly
interesting relationships.
Now every statistician knows these arguments, and agrees with them to vary-
ing extent (most disputes are about the nature of cause, e.g. Pearl (2000)). But
the “civilians” who use the tools statisticians develop have not well assimilated
the arcane philosophy behind those tools. Civilians all too often assume that if
a hypothesis test has been “passed”, a causal effect—or something very like it,
like a “link” (a word nowhere defined)—has been confirmed. This is only natural
given the name: hypothesis test. This explains the overarching desire for p-value
hacking and the like. The result is massive over-certainty and a reproducibility
crisis, e.g. see among many others Begley and Ioannidis (2015); see too Nosek
et al. (2015).
This leaves prediction. Prediction makes sense and is understandable to
everybody, and best of all opens all models to verification, to real testing. A
hard check against reality is not the usual treatment statistical models receive.
This is a shame. The many benefits of prediction are detailed below.
There is not much point here adding to the critiques of p-values. Not every
argument against them is well known, but enough are in common circulation
that even their most resolute defenders are given pause, e.g. Nguyen (2016),
Trafimow and Marks (2015). The only good use for p-values is the one for which
they are designed. Calculating the probability that certain functions of data will
exceed some value supposing a specified probability model holds. About whether
that, or any other, model is good, true, or useful, the p-value is utterly silent.
It’s funny, then, that the only uses to which p-values are put are on questions
they can’t answer.
The majority—which includes all users of statistical models, not just careful
academics—treat p-values like ritual, e.g. Gigerenzer (2004). If the p-value is
less than the magic number, a theory has been proved. It does not matter that
frequentist statistical theory insists that this is not so. It is what everybody
believes. And the belief is impossible to eradicate. For that reason alone, it’s
time to retire p-values.
As stated, Bayes factors come closer to the mark, but since they are stated
in terms of unobservable parameters, their use will always lead to over-certainty.
This is because we are always more certain of the value of parameters than we are
of observables. This is obvious since the posterior of any parameters feeds into
the equations for the predictive posterior of observables. Take an easy example.
Suppose we characterize the uncertainty in the observable y using a normal with
known parameters. Obviously, we are more uncertain of the observable than
the parameters, which are known with certainty. If we then suppose there is
uncertainty in the parameters (perhaps supplied by a posterior, or by guess), we
have to integrate out this new uncertainty in the parameters, which increases the
The Replacement for Hypothesis Testing 5

uncertainty in the observable. For these reasons, we do not comment further on


Bayes factors, though we do use what is usually considered an objective Bayes
framework, suitably understood, to produce predictions. Frequentist probability
predictions can also be used, but with difficulties in interpretation.
We take probability to be everywhere conditional, and nowhere causal, in the
same manner as Briggs (2016), Franklin (2001), Jaynes (2003), Keynes (2004).
Accepting this is not strictly necessary for understanding the predictive position,
but it is for a complete philosophical explanation. This philosophy’s emphasis on
observables and measurable values which inform observables is also important.

2 Predictive Assessment
All quantifiable probability models for observables y can fit this predictive
schema:
Pr(y ∈ s|X, D, M) (1)
where y is the observable of interest (the dimension can be read from the con-
text), s a subset of concern, M is the evidence and premises that suggest the
model form, D is optionally old (or assumed) measurements of (y, x) and X
optionally represents new or assumed values of x. It is well to stress that proba-
bility, like logic, does not restrict itself to statements on observable propositions.
But scientific models do revolve around that which can be measured. Thus, the
only type of models we discuss here will be for observable, i.e. measurable, y.
It is also worth emphasizing M is usually a complex, compound proposition
that includes everything used to judge the model. Statisticians have developed
a shorthand that works well with mathematical manipulations of models, but
which masks important model information. Since nearly all models in practical
use are assigned ad hoc, the masking emboldens the false belief the model used
in an application is the correct model, or at least one “close enough” to the true
one. This over-emphasizes the importance of hypothesis testing, leading to over-
certainty that causal, or semi-causal “links”, have been properly identified. And
this in turn has led to a most unfortunate non-practice of model verification.
It is rare to never that the vast army of published models ever undergo testing
against the real world. About that subject, more below.
The majority of probability models follow one of two basic forms. Paradig-
matic examples:
MD = “A 6-sided object with sides labeled 1–6, which will be tossed, after
which one side must show”. The observable y is the side, with s = 1 · · · 6. Then
Pr(y = i|M)1/6, ∀i. About why this deduction holds, and about why we believe
we can deduce probability and why we do not believe probability is subjective,
we relegate to Briggs (2016).
Mtemp = “The uncertainty of tomorrow’s high temperature quantified by a
normal distribution, whose central parameter μ is a function of yesterday’s high
and an indicator of precipitation”; i.e. a standard regression.
MD has no parameters and requires no old observations. Its general form is
MP = P1 P2 · · · Pm , where each P is a premise as in a logical argument, and the
6 W. M. Briggs et al.

model itself is a conjunction of these premises. Each of the P may be arbitrarily


complex.
Mtemp is a parameterized model typically requiring old observations, and
in Bayesian analysis evidence on the uncertainty of the parameters, i.e. prior
distributions. The evidence suggesting the priors is assumed to be part of Mtemp .
Of course, there may, and even must, be some number of premises P included in
parameterized models. The one that must be present is the one identifying the
parameterized model. E.g. P = “Uncertainty in the observable will be quantified
with a normal distribution”. This P is almost always ad hoc. This does not mean
not useful.
Classical hypothesis testing in frequentist or Bayesian terms is usually applied
to parametric models, with the goal of model selection, a potentially confusing
term, as we shall see. The general idea is simple. In its most basic form, two
models are proposed, parameterized or not, both identical except one will have
one less premise or parameter. For example:

MPa : P1 P2 · · · Pm−1 Pm (2)


MPb : P1 P2 · · · Pm−1 (3)

Mθ1 : μ = θ0 + θ1 x1 + θ2 I(x2 ) (4)


Mθ2 : μ = θ0 + θ1 x1 (5)

where in the first set of comparisons MPb has one fewer premise than does MPa .
In the second set of comparison x1 might be, from the example above, yesterday’s
high temperature, and I(x2 ) the indicator of precipitation. The ordering of more
to less complex models does not, of course, matter.
Predictive selection for premise-based models is simplicity itself. But don’t
let its simplicity fool you. It contains the very basis of how models are actually
built. Calculate

Pr(y ∈ s|X, D, MPa ) = p + δ (6)


Pr(y ∈ s|X, D, MPb ) = p (7)

Using the nomenclature of Keynes, premise Pm is relevant to y at s if δ = 0


(the obvious restrictions on the values of p and δ apply); otherwise it is relevant.
Using the example above with MD remaining the same, and letting MD+1 = MD
& “Candy canes have peppermint flavoring.” Then

Pr(y ∈ s|MD+1 ) = 1/6 + 0 = 1/6, ∀s (8)


Pr(y ∈ s|MD ) = 1/6, ∀s. (9)

Obviously, the flavoring of candy canes is irrelevant to knowing which side of a


die will show. At no value of s was δ non-zero. The premise is therefore rejected.
The example is silly, but it highlights an important truth. All models are built
like this. Scores of irrelevant premises are rejected at the outset, with little or no
The Replacement for Hypothesis Testing 7

thought. This is the right thing to do, too. Yet it is the reason the premises are
rejected that is important. Model builders reject premises because they know
the probability of the observable y at some measurable x will not change. If
you like, we can say that the hypothesis that the premise is relevant has been
rejected—and rejected absolutely.
Hypothesis testing, then, begins well before any p-value is calculated or even
data collected. It does not reach any level of formality until well down the road.
This is interesting because if people were truly serious about the theory behind
p-values, to remain consistent with that theory, p-values (and Bayes factors)
should be used to rule out every hypothesis not making it into the final model.
Now every is a lot; indeed, it is infinite. Since any hypothesis not making it into
the final model must be rejected in the formal way, true p-value and Bayes factor
believers would thus never finish testing. No model would ever get built in finite
time.
What we are proposing is an approach which is everywhere consistent. And
which produces no paradoxes.
In the case of comparing parameterized probability models, there is uncer-
tainty in which model is “better”. But there is no uncertainty in calling any
model true, if that word is meant in the causal sense. None but the strictly
causal (perfectly predictive) model is true. If we knew the actual cause of y, or
what determines the value of y, then we would not need a probability model.
Causal models are not impossible, or even rare. Physics is awash in causal and
deterministic models (to know the cause is greater than to know what determines
a value).
Most, or even all, statistical models are ad hoc. In the temperature example,
it is obvious many other parameterized, and even unparameterized, models could
have been used to express uncertainty in y. Not just in the sense that extra terms
could be added to the right hand side of the regression, but entirely different
model structures. Normal distributions do not have to be used, for instance. The
model need not be linear in the parameters. The possibilities for ad hoc models
are limitless.
That is what makes talk of “true” values of the parameters curious. Since
statistical models are ad hoc and not true in any causal sense, and since nearly all
models do not specify the precise and total circumstance of an observable (i.e. all
auxiliary premises, see Trafimow (2017)), it is vain to search for “true” values of
parameters. Even at a hypothetical, never-will-be-reached limit. Again, physics
comes closest to an apt understanding of true values of parameters, because
there carefully controlled experiments can be run that delineate all the (known)
possible causal factors. In these limited circumstances, it makes more sense to
speak of true parameter values. Parameters in this sense often have physical
meaning, at least by proxy. But, again, this does not hold for the vast majority
of probability models.
8 W. M. Briggs et al.

Predictive selection for parametric models is as easy as above. Calculate

Pr(y ∈ s|X, D, M1 ) = p + δ (10)


Pr(y ∈ s|X, D, M2 ) = p (11)

Assume M1 is the model with the greater number of parameters. Again, we


assume the obvious numerical restrictions of p and δ. If at s, and given X and D,
δ = 0, the parameter(s) in M1 , and therefore the measurements associated with
those parameters, are irrelevant to the uncertainty of y. These X, and these
parameters, are therefore not needed in the model. Removing them does not
change the probability. The models in (10) are predictive, meaning the uncer-
tainty in the parameters given by priors is integrated out. Yet even frequentists
can use this method, as long as probability predictions can be made from the
frequentist model.
If at any s, for the given X and D, δ = 0, then the X and its parameters
are relevant. Whether to keep the extra parameters becomes a standard prob-
lem in decision analysis. A relevant parameter important to one decision maker
can be unimportant to another. There can be no universal value of δ useful
in all situations, like there is with the magic number for p-values. As should
be clear, relevance depends on s and on everything on the right hand side of
the probability equation. That means any change on the right hand side might
change the measure of relevance. That accords with common sense: change your
information, change your basis of judgment.
In practice, on a per-model, per-decision basis, a δ is chosen, which may
depend on s, below which measurements are decided to be unimportant, and
above which are important. Measurements, and their associated parameters, are
kept or discarded accordingly.
An additional advantage of this approach is that no parameter estimates
are needed, or even desired. Parameters are not in any case observable. The
models are already ad hoc anyway, so focusing on parameter estimates, either
as a Bayesian posterior or a frequentist point estimate with confidence interval,
produces over-certainty in any X’s importance. The predictive approach thus
unifies testing and point estimation.
Not only can (10) be used in intra-model selection, but it is ripe for estimating
the probabilistic importance of each X. It will often be found that a model with
multiple parameters will show a wee p-value and large (relative) point estimate
for one parameter, and a non-publishable p-value and small point estimate for
the second parameter. But when (10) is employed, the order of importance is
inverted. Changing the value of X for the classically “weaker” parameter will
produce larger variations in probability of y ∈ s, especially for values of s thought
crucial in the problem at hand.
The Replacement for Hypothesis Testing 9

3 Examples
3.1 Example 1: Product Placement Recall
We begin for the sake of clarity with the simplest of examples. Results of a
survey to relate ability to recall product placement in theater films by movie
genre (Action, Comedy, Drama) and sex were asked on 137 people, each giving
a response (a score) with the number of correct recalls in the discrete interval 0–6,
Park and Berger (2010). The data were initially analyzed using null hypothesis
significance testing. The conclusion of the authors was “Results suggest that
brand recognition is more common in drama films.”
An ordinary regression in R on the score by sex (M = 1, or 0) and movie
genre was run, producing the following ANOVA table (sans hyperventilating
asterisks).

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.4994 0.1930 18.134 <2e-16
M1 0.3952 0.2489 1.588 0.1147
GenreComedy 0.4087 0.2712 1.507 0.1342
GenreDrama 0.7077 0.2792 2.535 0.0124

The p-values for sex (difference) and Comedy were larger than the magic
number. Some authors would at this point remove sex from the model. The
p-value for Drama was publishable, hence the conclusion of the authors.
Predictive probabilities of the full model were calculated, assuming standard
out-of-the-box “flat” priors. Posteriors on the parameters were first calculated,
then these were integrated out to produce the predictive posterior of the observ-
able score, see Bernardo and Smith (2000). The results would, of course, change
with a different prior; but so would they change with a different model. We are
not recommending this model, and certainly not recommending flat priors; we
are only showing how the predictive approach works in a common situation.
There is a bit of difficulty in creating predictive probabilities, because the
scores can only take the values 0–6, but the standard normal regression model
produces predictive probability densities along the continuum. Indeed, the model
produces predictions of positive probabilities for values of scores less than 0 and
greater than 6, scores that will never be seen (they are impossible) in any repeat
of the experiment. We elsewhere call the assignment of positive probability to
impossible events probability leakage, Briggs (2013). It usually shows up when
regression models do not make good approximations and when the observable
lives in a limited range, or when the observable’s discreteness is stark.
In this case, for males, the predictive probabilities for scores greater than 6
are 0.06 for Action, 0.1 for Comedy, and 0.15 for Drama (these are probabilities
for known impossible values). In other words, given the person is a male assessing
a Drama, the model predicts a probability of 0.15 for new scores greater than
6. For females, the numbers are 0.03, 0.06, and 0.09 respectively. Not small
numbers. For scores less than 0, the predictive probabilities are for men are all
10 W. M. Briggs et al.

less than 0.001; for women the largest is 0.003. Whether any of these numbers
is important depends on the decisions to which the model is put, and not on
whether any statistician thinks them small or large. About these decisions, we
are here agnostic.
The next decision is how to turn the predictions which are over the real
line to predictions of discrete observable scores. One way of doing this, which
is not unique, is to calculate the predictive probability for being between 0 and
0.5, and assign that to a predictive probability of score = 0; next calculate the
predictive probability for being between 0.5 and 1, and assign that to a predictive
probability of score = 1; and so on. The probability of 5.5 to 6.5 can be assigned
to score = 6, with the remainder being left to leakage, or everything greater than
5.5 can be assigned to score = 6; correspondingly, everything less than 0.5 can
be assigned score = 0. Now all this rigmarole would not have been necessary
if a model which only allowed scores 0–6 were used (perhaps a multinomial
regression). But our purpose here is not to find terrific or apt models; we only
want to explain how to use the predictive approach for models people routinely
use.
It is crucial to understand that in creating predictive probabilities, as in
Eq. (6), the model must be fully specified in each prediction. In other words, we
created a model of sex and genre because we thought these measurements would
change the uncertainty in the score, therefore for each and every prediction we
make, we must specify a value of sex and genre.
Figure 1 shows the predictive probability for men for each genre. Clearly, the
difference in these probabilities are non-zero, hence δ = 0; so, genre is relevant to
uncertainty in score. The differences in probabilities clearly depends on the level
of score (the s), ranging from about 0.001 (in absolute values) for s = 1, up to
0.14 for s = 6. Again, whether these differences are important depends on the
decisions to which the model will be put. Supposing for the sake of argument a
δ = 0.05 (a familiar number!) to indicate importance, then there is no important
differences in probabilities between Action and Comedy for scores of 0–2 and
4–5 but there are for scores of 3 and 6. The p-value would lead to the decision
of no difference between Action and Comedy. But with our chosen δ, there is a
clear difference in importance.
Now the same plot (or calculations: visual inspection is not necessary) should
be done for females by genre, and the differences assessed there too. We skip that
step, noting that the important differences exist here, too, and for different scores
for the genres. We instead show Fig. 2, the differences in sex at the Drama genre.
The differences (in absolute value) are between 0.002 and 0.08. The importance
δ is exceed at scores of 3 and 6.
Again, the p-value for sex was not wee, and sex might have been dropped
from the model. The important differences noted for Drama were also found for
Comedy, but not for Action, though these were not noted by the p-values.
This level of detail in an analysis won’t always be needed. Instead, tables like
the following can and should be presented. Plots and summaries may of course
be better, depending on the situation. Here there are two different regression
The Replacement for Hypothesis Testing 11

Fig. 1. The days on which the interested events occur for DTAC

Table 1. Probabilities (rounded to nearest hundredth) for scores 0–6 for the genre
Drama, with and without considering sex, in two separate regression models.

Sex s=0 s=1 s=2 s=3 s=4 s=5 s=6


Either 0.00 0.01 0.06 0.17 0.28 0.27 0.20
Male 0.00 0.01 0.05 0.15 0.27 0.28 0.25
Female 0.00 0.02 0.08 0.20 0.29 0.25 0.16

models, the first without sex and the second with. Readers are free to make
decisions based on their own δs, which might differ from the authors’.

3.2 Example 2: Professor’s Salaries


This next example shows the flexibility of the predictive method, and its poten-
tial for partial automation. Full automation of analysis is not recommended for
any model, except in special circumstances. Automation can cause one to forget
limitations.
12 W. M. Briggs et al.

Fig. 2. Predictive probability of score for men and women for the Drama genre.

Nine-month salaries for 2008–2009 were collected on 397 academics at various


ranks for a college in the USA for two departments A and B “roughly correspond-
ing to theoretical disciplines and applied disciplines, respectively”, quoted from
Fox and Weisberg (2011). Faculty sex, years since PhD and years of service were
also measured. The minimum measured salary was $57,800 and the maximum
was $231,5000, proving at least one of us is in the wrong job.
Obviously, we use this data to make predictions of people not in this data
set, because we already know all we can about the salaries of people we have
already measured.
That is, we desire naturally to make predictions.
Here is the ordinary ANOVA table:

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 78.8628 4.9903 15.803 < 2e-16
rankAsstProf -12.9076 4.1453 -3.114 0.00198
rankProf 32.1584 3.5406 9.083 < 2e-16
disciplineB 14.4176 2.3429 6.154 1.88e-09
The Replacement for Hypothesis Testing 13

yrs.since.phd 0.5351 0.2410 2.220 0.02698


yrs.service -0.4895 0.2119 -2.310 0.02143
sexMale 4.7835 3.8587 1.240 0.21584
The standard ANOVA tells us little about predictions. That is easily reme-
died in Table 2, which we label the predictive “ANOVA” table. It uses the same
regression model with (again) “flat” out-of-the-box priors. It shows the central
(most likely) estimate for the condition noted, and which holds all other mea-
surements fixed at their observed median values or base levels (to be defined
below). The categorical variables are stepped through their levels, while the
others step through the first, second, and third observed quartiles. Any other
values of special interest may of course be substituted, but we leave these to
demonstrate how an automatic analysis might look.

Table 2. Predictive “ANOVA” table for salaries.

Variable Level Central Salary ($1,000s) Pr(Salary > base level)


rank AssocProf 101 0.5
rank AsstProf 88.6 0.343
rank Prof 134 0.844
discipline A 119 0.5
discipline B 134 0.675
yrs.since.phd 5 125 0.5
yrs.since.phd 21 134 0.606
yrs.since.phd 40 144 0.719
yrs.service 3 140 0.5
yrs.service 16 134 0.421
yrs.service 37 123 0.302
sex Female 129 0.5
sex Male 134 0.56

This Table also shows the predicted probability that a person holding these
attributes would have a higher salary than a “base level” person. The base level
is not unique and can be user specified as a particular level of interest. Here we
take the first level of all other categorical measures as ordered (alphabetically)
by R. The first level of rank is “AssocProf”, with “AsstProf” coming after,
alphabetically. The non-categorical measures take as base their observed first
quartile values.
For example, the predicted most likely salary for an Associate Professor in
discipline B (the median), and male (also median), with 21 years since PhD and
16 years of service is $101 thousand. The probability another person at the base
level, which in this case is a person with the same attributes, is, as expected, 0.5
(in this model, the posterior predictive distributions are all symmetric around
14 W. M. Briggs et al.

the central value). We next hold all these attributes constant, but change the
rank, so that we now have a new male Assistant professor in discipline B with
21 years since PhD and 16 year service. The probability this new man has a
higher salary is 0.34, meaning, of course, a man with the higher rank has a
probability of 0.66 of having a higher salary.
These tables take only a little getting used to, and they are easily modified, as
a standard ANOVA is not, for questions interesting to decision makers. Relevance
can be picked off the table: any probability differing from 0.5 shows relevance,
at least for the levels specified. Direct information about the observable is also
prominent.
This table does not obviate a fuller analysis, as was done above in the first
example. Plots and tables of the same sort can and should be made. For example,
as in Fig. 3.

Fig. 3. Predictive probability differences between men and women in discipline B for
new Assistant Professors in black (0 years of service, 1 year since PhD) and for seasoned
Professors in red (24 years of service, 25 years since PhD). Probabilities are calculated
every $5,000.

This shows the predictive probability differences between men and women in
discipline B for new Assistant Professors in black (0 years of service, 1 year since
PhD) and for seasoned Professors in red (24 years of service, 25 years since PhD).
Probability differences are calculated every $5,000. Most of these differences are
0.01, or less. The largest difference was for new hires at a salary lower than was
observed. This implication is that while there were observed differences in salaries
between men and women, the chances are not great for seeing them persist in
new data. At least, not for individual salaries. Calculating the differences over
larger “block” sizes of salaries, say, every $10 or $15 thousand would show larger
differences.
The Replacement for Hypothesis Testing 15

4 The Conclusion Lies in Verification


The predictive approach does not solve all modeling ills. No approach will. It
reduces some, but only some, of the excesses in classical hypothesis testing.
Although we advise against a universal, one-size-fits-all value of δ, all experience
shows such a value will be picked. Doing so makes model selection and presenta-
tion automatic. People prefer less work to more. The predictive approach clearly
entails more work than standard hypothesis testing in every aspect. As such,
there will be reluctance to use it. It also does not provide answers that are as
sharply defined as hypothesis testing. And people crave certainty—even when
this certainty is exaggerated, as it is with classical hypothesis testing. Every
statistician knows how easy it is to “prove” things with p-values.
Any approach that does not add model verification to model selection is
doomed to failure. Models must be tested against reality. It is not at all clear
how to do this with classical hypothesis testing. As said above, the idea a “test”
has been passed gives the false impression the model has been checked against
reality and found good.
True verification is natural using the predictive approach. Models under the
predictive approach are reported in probability form. Advanced training in sta-
tistical methods are not needed to understand results. The models reported in
Table 1 require no special expertise to comprehend. These are the (conditional)
probabilities of new scores that might be observed, perhaps depending on the
sex of the participant. “Bets” (i.e. decisions) can be made using this table. Here
the standard apparatus of decision analysis comes into play in choosing which
probabilities are important, and which not. If the model is a good one, the
probabilities will be well calibrated and sharp, when considered with respect to
whatever bets or decisions that are made with it.
Anybody can check a predictive model (given they can recreate the original
scenarios). The original data is not needed, nor the computer code used to gen-
erate it. The model is laid bare for all to see and test. Limitations and strengths,
especially for controversial and “novel” research, will quickly become apparent.
How best to do verification we leave to outside authorities. This list is far
from complete, but a good place to start is here: e.g. Gneiting and Raftery
(2007), Briggs and Zaretzki (2008), Hersbach (2000), Wilks (2006), Briggs and
Ruppert (2005), Briggs (2016), Gneiting et al. (2007). The idea is basic. Produce
predictions and compare these using proper scores against observations never
used or seen in any way before. This is the exact way civil engineers test models of
bridges, or electrical engineers test models of cell phone capacity, etc. The “never
used” is strict, and thus excludes cross validation and other approaches which
reuse or “peek” at verification datasets when building a model. It’s not that
these methods don’t have good uses, but that the will always inflate certainty
in the actual value of a model.
Verification, like model building is not exact, and cannot be. We must guard
against the idea that if a theory has passed whatever test we devise, we have the
best or a unique theory. Verification is not proof. Quine and Duhem long ago
showed theories or models besides the one under consideration and testing could
16 W. M. Briggs et al.

equally well explain any set of observed (contingent) data, Quine (1953), Duhem
(1954). And when testing, the auxiliary assumptions (all implicit premises) of a
model can be difficult or impossible to disentangle; see Trafimow (2009), Trafi-
mow (2017) for a discussion. What can be said is that given past good perfor-
mance of a model, and taking care the conditions in all explicit and implicit
premises are also met, it is likely the model will continue to perform well.

References
Begley, C.G., Ioannidis, J.P.: Reproducibility in science: Improving the standard for
basic and preclinical research. Circ. Res. 116, 116–126 (2015)
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, New York (2000)
Breiman, L.: Statistical modeling: the two cultures. Stat. Sci. 16(3), 199–215 (2001)
Briggs, W.M.: On probability leakage. arxiv.org/abs/12013611 (2013)
Briggs, W.M.: Uncertainty: The Soul of Probability, Modeling & Statistics. Springer,
New York (2016)
Briggs, W.M., Ruppert, D.: Assessing the skill of yes/no predictions. Biometrics 61(3),
799–807 (2005)
Briggs, W.M., Zaretzki, R.A.: The skill plot: a graphical technique for evaluating con-
tinuous diagnostic tests. Biometrics 64, 250–263 (2008). (With discussion)
Duhem, P.: The Aim and Structure of Physical Theory. Princeton University Press,
Princeton (1954)
Fox, J., Weisberg, S.: An R Companion to Applied Regression, 2nd edn. SAGE Publi-
cations, Thousand Oaks (2011)
Franklin, J.: Resurrecting logical probability. Erkenntnis 55, 277–305 (2001)
Gigerenzer, G.: Mindless statistics. J. Socio Econ. 33, 587–606 (2004)
Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation.
JASA 102, 359–378 (2007)
Gneiting, T., Raftery, A.E., Balabdaoui, F.: Probabilistic forecasts, calibration and
sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol. 69, 243–268 (2007)
Hersbach, H.: Decompostion of the continuous ranked probability score for ensemble
prediction systems. Weather Forecast. 15, 559–570 (2000)
Hitchcock, C.: Probabilistic causation. In: The Stanford Encyclopedia of Philosophy
(Winter 2016 Edition) (2016). https://plato.stanford.edu/archives/win2016/entries/
causation--probabilistic
Jaynes, E.T.: Probability Theory: The Logic of Science. Cambridge University Press,
Cambridge (2003)
Keynes, J.M.: A Treatise on Probability. Dover Phoenix Editions, Mineola (2004)
Mulder, J., Wagenmakers, E.J.: Editor’s introduction to the special issue: Bayes fac-
tors for testing hypotheses in psychological research: Practical relevance and new
developments. J. Math. Psychol. 72, 1–5 (2016)
Nguyen, H.T.: On evidence measures of support for reasoning with integrated uncer-
tainty: a lesson from the ban of p-values in statistical inference. In: Huynh, V.N.,
Inuiguchi, M., Le, B., Le, B., Denoeux, T. (eds.) Integrated Uncertainty in Knowl-
edge Modelling and Decision Making, pp. 3–15. Springer, Cham (2016)
Nosek, B.A., Alter, G., Banks, G.C., et al.: Estimating the reproducibility of psycho-
logical science. Science 349, 1422–1425 (2015)
Park, D.J., Berger, B.K.: Brand placement in movies: the effect of film genre on viewer
recognition. J. Promot. Manag. 22, 428–444 (2010)
The Replacement for Hypothesis Testing 17

Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press,
Cambridge (2000)
Quine, W.V.: Two Dogmas of Empiricism. Harper and Row, Harper Torchbooks,
Evanston (1953)
Trafimow, D.: The theory of reasoned action: a case study of falsification in psychology.
Theory Psychol. 19, 501–518 (2009)
Trafimow, D.: Implications of an initial empirical victory for the truth of the theory
and additional empirical victories. Philos. Psychol. 30(4), 411–433 (2017)
Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 37(1), 1–2 (2015)
Wilks, D.S.: Statistical Methods in the Atmospheric Sciences, 2nd edn. Academic Press,
New York (2006)
On Quantum Probability Calculus for
Modeling Economic Decisions

Hung T. Nguyen1(B) , Songsak Sriboonchitta2 , and Nguyen Ngoc Thach3


1
Department of Mathematical Sciences, New Mexico State University, Las Cruces,
NM 88003, USA
hunguyen@nmsu.edu
2
Faculty of Economics, Chiang Mai University, Chiang Mai 50200, Thailand
songsakecon@gmail.com
3
Banking University of Ho-Chi-Minh City, 36 Ton That Dam Street, District 1,
Ho-Chi-Minh City, Vietnam
thachnn@buh.edu.vn

Abstract. In view of the Nobel Memorial Prize in Economic Sciences


awarded to Richard H. Thaler in 2017 for his work on behavioral eco-
nomics, we address in this paper the fundamentals of uncertainty mod-
eling of free will. Extensions of von Neumann’s expected utility theory
in social choice, including various nonadditive probability approaches,
and prospect theory seem getting closer to cognitive behavior, but still
ignore an important factor in human decision-making, namely the so-
called “order effect”. Thus, a better candidate for modeling quantita-
tively uncertainty, under which economic agents make their decisions,
could be a probability calculus which is both nonadditive and noncom-
mutative. Such a probability calculus already exists, and it is called
“quantum probability”. The main goal of this paper is to elaborate on
the rationale of using quantum stochastic calculus in decision-making for
econometricians, in a conference such as this, who are not yet aware of
this new trend of on going research in the literature.

Keywords: Behavioral economics · Choquet capacity


Expected utility · Nonadditive probability · Noncommutativity
Quantum probability calculus

1 Introduction
In 1951, Feynman [11] came to a symposium on mathematical statistics and
probability at the university of California, Berkeley, to let probabilists and statis-
ticians know that the quantitative notion of chance in the context of intrinsic
randomness in quantum mechanics is not the same as the one considered much
earlier by Laplace (and hence different from the more general probability concept
formulated by Kolmogorov in 1933, which is the standard quantitative modeling
of uncertainty used in almost all sciences). That did not seem to get any atten-
tion of probabilists and statisticians, perhaps because the nonadditivity (and
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 18–34, 2019.
https://doi.org/10.1007/978-3-030-04263-9_2
On Quantum Probability Calculus for Modeling Economic Decisions 19

noncommutativity) of quantum probability might be viewed as “appropriate”


only for “quantum uncertainty” in physics, and not for “ordinary” uncertainty
encountered in games of chance and ordinary random phenomena in social sci-
ences. The most significant efforts on developing further quantum probability,
on purely mathematical grounds, were Meyer [23] and Parthasarathy [26]. How-
ever, there seems to be no reasons for thinking about using quantum probability
outside of physics of particles, let alone in standard statistics.
Unlike Newtonian mechanics, quantum mechanics is intrinsically random,
and as such, deterministic laws should be replaced by stochastic models. In
physics, such an approach is called an “effective theory” [17], i.e., a framework
created to model observed phenomena without describing in detail all of the
underlying processes. The situation seems similar as far as predicting human
behavior is concerned. Specifically, as Hawking put it [17] (p. 47): “Economics
is also an effective theory, based on the notion of free will plus the assumption
that people evaluate their possible alternative courses of action and choose the
best. That effective theory is only moderately succcessful in predicting behavior
because, as we all know, decisions are often not rational or are based on a
defective analysis of the consequences of the choice.“That is why the world is in
such a mess”.
Since economics (and hence economic data) is “created” by humans (eco-
nomic agents), the buzzword is “free will”. It is not possible to have physical laws
to predict human behavior, we propose models, such as Von Neumann’s expected
utility as a social choice theory, based on rationality and rational “degrees of
belief” (Kolmogorov probability which is an additive and commutative theory).
On the other hand, as Hawking reminded us “Since we cannot solve the equa-
tions that determine our behavior, we use the effective theory that people have
free will. The study of our will, and of the behavior that arises from it, is the
science of psychology”. As such, not only the study of economics should be from
a psychological perspective (called behavioral economics), but also, psychology
should play an essential role in economics, somewhat similar to experiments in
physics to validate models.
It all boils down to uncertainty, the “soul” of modeling, probability and
statistics, as Briggs put it [3]. In an effective theory like economics, where free
will is a consequent of human mind (a human has two ingredients: a body and
a soul; the body is a machine, and hence subject to physical laws, but the soul,
living somewhere in the brain, is something different from the physical world,
and hence is not subject to physical laws), the “soul” is in fact the source of
uncertainty we face!
Except death and taxes, everything else is uncertain. But while uncertainty
has a general, common sense meaning, it is specific in each context. For exam-
ple, we can talk about uncertainty in physics, or in economics. In an uncertain
situation, when we talk about “the chance” of something, we implicitly refer to
a quantitative notion of chance, i.e., some measurement of uncertainty. In the
recent book “Ten Great Ideas about Chance” [8], Diaconis and Skyrms discussed
20 H. T. Nguyen et al.

all quantitative approaches to measuring chance, i.e., various probability calculi.


Of course, all probability calculi are additive and commutative.
Do we already have a unique theory of probability to model quantitatively the
notion of uncertainty? In their ninth great idea “physical chance”, they wrote (p.
180) “Does quantum mechanics need a different notion of probability? We think
not”. Well, of course not, as Richard Feynman said it clearly [11] “the concept of
probability is not altered in quantum mechanics”. Note that Richard Feynman’s
paper [11] was not listed in [8]. Instead, the point Feynman wanted to make
is this “What is changed, and changed radically, is the method of calculating
probabilities”, i.e., another probability calculus, and that is so because, in the
context of quantum mechanics (where the uncertainty is due to “free will” of
particles to choose paths to travel) “Nature with her infinite imagination has
found another set of principles for determining probabilities; a set other than that
of Laplace, which nevertheless does not lead to logical inconsistencies”. Thus,
the point is this. A quantitative theory of chance should be developed in each
uncertain environment, no single universal probability calculus is appropriate
for all situations. The authors of [8] declared that they are thorough Bayesian,
i.e., adopting epistemic uncertainty, using Bayes’ rule and additive calculus of
probabilities. In Sect. 2 below, a typical experiment of W. Edwards revealed that,
in social sciences, people may not behave according to Bayes’ updating rule! See
also Gelman’s comments on this issue [14].
It was von Neumann who provided the mathematical language for quantum
mechanics (the counterpart of Newton’s calculus for his own mechanics), but it
seems he did not pay much attention to its probabilistic aspect, i.e., quantum
probability, except its logic (quantum logic), let alone using quantum proba-
bility in some other places. Instead, when moving to social sciences, he used
Kolmogorov probability to formulate his effective theory of free will (economic
behavior) in [32]. See also [21].
Von Neumann’s expected utility [32] is the norm for human behavior in
economics, such as game theory (economic competition), social choice. But it is
only a “hypothesis”, or if you like, a “model” of human behavior. As a model,
it needs to be validated for usage. As in physics, a model is “reasonable” (to be
considered as a “law”) if there are no experiments contradicting its predictions.
Note that to validate a model, we use predictions, and not by “statistical testing”!
Here, von Neumann’s expected utility, as a model of “rational” behavior of free
will, predicts how humans make decisions (choosing alternatives) in the face of
uncertainty/risk. It is right here that psychologists become useful, especially for
economics!
It is well known that Von Neumann’s expected utility theory was contra-
dicted by major “paradoxes” like Allais [1], Ellsberg [9], see also [12,13]. The
main root of these violations of expected utility is the additivity of Kolmogorov
probability axiom. Using the spirit of physics, when a proposed model (here,
for describing human behavior under uncertainty) is violated by “experiments”
(here, psychological experiments), it has to be reexamined and modified. Now,
the concept of expected utility is defined with respect to standard probabil-
On Quantum Probability Calculus for Modeling Economic Decisions 21

ity measures which are additive set-functions. It is the additivity of probability


measures which causes the Ellsberg’s parodox. Thus, the first thing to look at is
modifying this additivity axiom for a measure of uncertainty. This is somewhat
similar to physics: Depending upon environments, physical laws are different:
while Newton’s laws in mechanics are appropriate for motion of macro-objects
(at moderate velocities), we need to modify them for velocities near the speed
of light (Einstein’s relativity), or replace them when dealing with micro-objects
like particles. We emphasize an important point, and that is: replacing does not
mean necessarily “destroying”. Quantum mechanics did not destroy Newtonian
mechanics, it is just applied to another environment. Here, Kolmorogov probabil-
ity, and von Neumann’s expected utility criterion are appropriate for, say, natural
phenomena; whereas, as we will elaborate upon, quantum probability (because of
its properties of nonadditivity and noncommutativity, and not really because we
think that particles and humans have something in common, or similar, namely
“free will”, where by free will we mean the freedom of choices, regardless of the
cause of free will), seems appropriate for social sciences. Let be clear. We will
use the term “quantum probability” because, as Richard Feynman has pointed
out to everybody, it is an uncertainty measure which has precisely two desirable
properties (nonadditivity and noncommutativity) for modeling human’s free will,
and not because of quantum mechanics!
The above mentioned “paradoxes” have triggered efforts to modify stan-
dard probability (as the main ingredient in a decision theory) to various types
of “nonadditive probabilities”, such as Dempster-Shafer belief functions [5,30],
Choquet capacities [18,31], possibility theory [34], imprecise probabilities [33],
among others, see also e.g., [6,10,12,20,24,26–28].
On the other hand, behavioral economists argued that people make decisions
based on the potential value of losses and gains rather than the final outcome,
as well as evaluating these losses and gains using certain “heuristics” [19]. Recall
that the 2017 Nobel Memorial Prize in Economic Sciences was a recognition
of an integration of economics with psychology. It is about time! One question
remains: How to model heuristics?
And more recently, and finally, the light appears! Quantum probability
appears to be the best candidate for modeling human uncertainty. See e.g.,
[2,4,7,16,29].
Since this paper aims simply at calling economists’ attention to a promising
approach to faithfully model the uncertainty under which humans and economic
agents make their decisions, we will be somewhat tutorial to get the message
across. As such, the paper is organized as follows. In Sect. 2, we recall some
basic violations of expected utility. In Sect. 3, we elaborate on some works on
nonadditive probabilities. In Sect. 4, we provide facts from psychological experi-
ments exhibiting more the inadequacy of additivity as well as commutativity of
standard probability measures, as far as modeling cognitive decision-making is
concerned. Section 5 is a tutorial on how quantum probability is built. Finally,
in Sect. 6, we indicate briefly some main aspects of quantum probability calculus
in behavioral finance and economics.
22 H. T. Nguyen et al.

2 Some Violations of Expected Utility


As stated, our aim is to make economists aware of a promising improve-
ment of behavioral economics, namely incorporating quantum probability into a
prospect-based framework. For that, to start out in this section, we recall some
well-known violations of von Neumann’s expected utility (for decision-making
under uncertainty) because of the additivity axiom in Kolmogorov’s formulation
of probability. This will serve as a motivation for considering various approaches
to nonadditive probability, in Sect. 3.
Below, from the literature, are the experiments (violating expected utility)
conducted by psychologists pointing out that von Neumann’s program is in fact
quite limited, triggering (in a healthy spirit of natural science) new developments.
(i) The Allais paradox (1953)
Consider the following gambles:

A : $2500(0.33), $2400(0.66), $0(0.01)

B : $2400(1.00)
C : $2500(0.33), $0(0.67)
D : $2400(0.34), $0(0.66)
The experiment consists of asking the participants the following question:
First, choose between gambles A and B, then, next, choose between C and D.
You could be one participant!
It was reported that most participants “behaved” by choosing B in their first
choice, and choosing C in their second choice. What are YOUR choices?
Let’s see if their choices are consistent with expected utility model, or can
the expected utility model explain their experimental choices? For that, consider
one participant who have chosen B over A in her first choice, and C over D in
her second choice. Let u(.) be her utility function. If she follows the expected
utility “rule”, then she has chosen B over A in her first choice, because

EB (u) = u(2400) > EA (u) = (.33)u(2500) + (.66)u(2400) + (.01)u(0)


⇐⇒ (.33)u(2500) < (.34)u(2400) − (.01)u(0)

In her second choice, with the same utility function u(.), she has chosen C
over D because

EC (u) = (.33)u(2500) + (.67)u(0) > ED (u) = (.34)u(2400) + (.66)u(0)

⇐⇒ (.33)u(2500) > (.34)u(2400) − (.01)u(0)


But these inequalities go in opposite directions! Although she has her utility
function, her choices were not dictated by taking expected utility, a clear viola-
tion of the expected utility model.
On Quantum Probability Calculus for Modeling Economic Decisions 23

(ii) Ellsberg’s paradox (1961)


Consider the following situation (not conducting an experiment!) We have
two urns: urn A contains 100 balls, of which 50 balls are red, and 50 balls are
black; urn B contains also 100 balls but in an unknown proportion of black and
red balls. Suppose we ask a person to choose one urn and one color (of balls),
then draw a ball from the chosen urn. If the color of ball drawn from the chosen
urn is her chosen color, she wins, say, $70, otherwise, she wins nothing.
Let’s find out whether we can represent her choices by a probability. This
is a choice problem under uncertainty! An alternative θ here is the choice of an
urn and a color, e.g., θ = (A, red). When θ is preferred to θ , we write θ  θ .
If also, θ  θ, we write θ ∼ θ (we are indifferent about which one);  denotes
strict preference.
Suppose P is a probability on the set of alternatives, representing her pref-
erence relation, i.e., θ  θ =⇒ P (θ) ≥ P (θ ).
Now examining the situation, it is clear that she is indifferent about color,
but not about urn, so that her “behavior” is

(A, red) ∼ (A, black)

(B, red) ∼ (B, black)


(A, red)  (B, red)
(A, black)  (B, black)
implying that
P (A, red) = P (A, black)
P (B, red) = P (B, black)
But, since P is additive (so that P (A, red) + P (A, black) = 1 = P (B, red) +
P (B, black)), the above imply that

P (A, red) = P (A, black) = 0.50 = P (B, red) = P (B, black)

contracting (A, red)  (B, red), and (A, black)  (B, black)!
The conclusion is that there is no probability to represent the person’s behav-
ior. The problem is the additivity of the probability measure P .
(iii) W. Edwards (Violating Bayes’ updating rule, 1968)
Consider the following experiment. There are two boxes, one contains 700 red
balls and 300 balls; and the other contains 300 red balls and 700 blue balls.
Subjects know the composition of these boxes. Subjects choose at random one
of the boxes, and draw, say, 12 balls (without replacement) from it. Based upon
the results of their draws, they are asked to identify which box they were drawing
(which box is more “likely” to them).
Suppose the 12 draws resulted in 8 red balls and 4 blue balls. It turns out
that most of the subjects gave an answer between 70% and 80% for the box
24 H. T. Nguyen et al.

with more red balls, which is inconsistent with the likelihood 97% given by the
Bayesian updating formula.
Remark
(1) The computation of the posterior probability 97% is carried out as follows.
Denote by Box I the one with more red balls, and box II the other. The prior
probabilities are P (I) = P (II) = 12 . Let X denote the number of red balls drawn
in a sample of size n = 12. Its distribution is a hypergeometric H(N, D, n), with
N = 1, 000, D = 700 for box I, and D = 300 for box II. Specifically,
   −1
D N −D N
P (X = k) =
k n−k n

We seek P (I|X = 8) which is

P (X = 8|I)P (I)
P (I|X = 8) =
P (X = 8|I)P (I) + P (X = 8|II)P (II)

Note that the Bayesian updating procedure is based upon additivity of prob-
ability measure, also known as the “law of total probability”.
(2) Recall that the Bayesian approach to decision-making under “uncertainty”
is in fact under “risk”, where, according to Knight (1921), uncertainty and risk
are two different things. The distinction is this. Risk refers to situations where
probabilities are known, whereas uncertainty refers to situations in which proba-
bilities are neither known, nor can be deduced or estimated in an objective way.
The Bayesian approach minimizes the importance of this distinction by introduc-
ing the concept of subjective probability, and proceeds as follows: when facing
Knight uncertainty, just use your own subjective probabilities, so that the prob-
lem of decision-making under uncertainty becomes a problem under risk. Clearly,
the problem with the Bayesian approach to decision-making is this. Do people
always have “probability beliefs” over any source of uncertainty (say, to update
by Bayes’ rule)?
With respect to the Bayesian approach to uncertainty analysis and its appli-
cations, there are recent works, such as [15] revealing that “its axiomatic foun-
dations are not as compelling as they seem, and that it may be irrational to
follow this approach”. This could be so since, basically, the Bayesian paradigm
commands that “when you face any source of uncertainty (epistemic or objec-
tive), you should quantify it probabilistically; in the absence of objective prob-
abilities, you should “have” your own, subjective probabilities (to guide your
decisions), and if you don’t know what the probabilities are, you should adopt
some probabilities” (you are not allowed to say “I don’t know”!). But then “such
as choice would be arbitrary, and therefore a poor candidate for a rational mode
of behavior”. As such, considerations of beliefs that cannot be quantified by
Bayesian priors, and of updating of non-Bayesian beliefs have been investigated
in the literature. See the next section.
On Quantum Probability Calculus for Modeling Economic Decisions 25

3 Some Nonadditive Probabilities


An excellent place to see the evolution of von Neumann’s expected utility, mainly,
from (subjective) additive probability to nonadditive probability, is to follow the
journey that Fishburn traced for us, from 1970 [12] to 1988, [13]. Of course,
another text is Kreps [20] (p. 198) with his “vision” in 1988 for the future:
“These data provide a continuing challenge to the theorist, a challenge to
develop and adapt the standard models so that they are more descriptive of what
we see. It will be interesting to see what will be in a course on choice theory in
ten or twenty years time”.
Remember, we are now exactly 30 years later! What is the state-of-the-art of
the theory of choice? We will elaborate on this question throught out this paper,
but first, here is a flavor of how researchers have reacted to the additivity axiom
of standard probability. A good summary is in [31].
Perhaps, Dempster [5] was credited for emphasizing upper probability con-
cept, as a nonadditive set function for modeling uncertainty, triggering Shafer
[28] to develop a “mathematical theory of evidence”. If P is a probability mea-
sure on a measurable set (Ω, A ), then the set function P∗ : 2Ω → [0, 1], defined
by P∗ (B) = sup{P (A) : A ⊆ B, A ∈ A } satisfies P∗ (∅) = 0, P (Ω) = 1, and is
monotone of infinite order, i.e., for any k ≥ 2, and B1 , B2 , ..., Bk in the power
set 2Ω , with |I| denoting the cardinality of the set I,

P∗ (∪kj=1 Bj ) ≥ (−1)|I|+1 P∗ (∩i∈I Bi )
∅=I⊆{1,2,...,k}

Note that this monotonicity of infinite order is nothing else than a weakening
of Poincare’ equalities of probability measures. Clearly, P∗ (.) is a nonadditive
set function. More concretely, if X is a random set (i.e., a random element
whose values are sets) taking values as subsets of a set U , then its distribution
F (.) : 2U → [0, 1], F (A) = P (X ⊆ A) behaves exactly like P∗ , see e.g. [24]
Thus, although P∗ or F are nonadditive, they are somewhat related to additive
probability measures. It should be recalled that the Bayesian paradigm holds
that any source of uncertainty can and should be quantified by probabilities.
For example, if you do not have objective probabilities, you should have your
own subjective probabilities (to guide your decisions). Nonadditive measures of
uncertainty are designed when Bayesian prior cannot be quantified.
A nonadditive measure of uncertainty, called a possibility measure [34] in
the context of Zadeh’s theory of fuzzy sets (as opposed to ambiguity, fuzziness
or vagueness refers to situations, where it is difficult to form any interpretation
at the desired level of specificity) is defined axiomatically as π(.) : 2U → [0, 1],
π(∅) = 0, π(U ) = 1, and for any family {Ai }i∈I of subsets of U , π(∪i∈I Ai ) =
sup{π(Ai ) : i ∈ I}.
With respect directly to the problem of additivity in von Neumann’s expected
utility theory, it was Schmeidler [27,28] who provided a significant framework
for nonadditive probability. It is all because of Knightian uncertainty in eco-
nomics (or “ambiguity”, a type of meaning in which several interpretations are
26 H. T. Nguyen et al.

plausible). Essentially, Schmeidler’s work on decisions was based upon general


set functions, not necessarily additive, to model ambiguous beliefs in economics,
see also [22].
A general form of nonadditive set functions is Choquet capacities (general-
izations of additive measures), e.g., [18], and its associated Choquet integral to
be used as a nonadditive expected utility concept [31].
A capacity on a measurable space (Ω, A ) is a set function ν(.) : A → [0, 1]
such that ν(∅) = 0, ν(Ω) = 1, and monotone increasing, i.e., A ⊆ B =⇒ ν(A) ≤
ν(B). The Choquet integral of a (real-valued) random variable X is defined as
 ∞  0
Cν (X) = ν(X > t)dt + [ν(X > t) − 1]dt
0 −∞

The Choquet integral is clearly not additive, but it is “comonotonic additive”


in the following sense. Two random variables X, Y are said to be comonotonic
if for any ω, ω  we have [X(ω) − X(ω  )][Y (ω) − Y (ω  )] ≥ 0. For any capacity
ν(.), if X, Y are comonotonic, then Cν (X + Y ) = Cν (X) + Cν (Y ). This addi-
tivity is referred to as comonotonic additivity of Choquet integral. Schmeidler’s
subjective probability and expected utility without additivity [28] is based on
his integral representation without additivity [27] which we reproduce here. Let
B be the class of real-valued bounded random variables, defined on (Ω, A ). Let
H be a functional on B such that H(1Ω ) = 1, H(.) is increasing and comono-
tonic additive, then H(.) is a Choquet integral operator, i.e., H(.) = Cν (.) with
ν being a capacity defined on A by ν(A) = H(1A ). Note that Choquet inte-
grals are used as models for risk measures in financial econometrics, see [31] for
economic applications.
Remark. It is interesting to note that, even at present times, without knowing
psychological evidence spelled out in the next section, there are still research
on topics such as set functions, capacities, “nonclassical” measure theory, non-
additive measure theory for decision theory and preference modeling. These set
functions are nonadditive but still monotone increasing. As such, they violate the
“conjunction fallacy” and do not capture the noncommutativity of information.

4 Some Facts from Psychology

Before pointing out some basic evidence which change the way we used to use
the standard probability calculus, let us recall that the behavioral economic
theory is based essentially on the prospect theory [19] which, in the context of
financial econometrics, states that people make decisions based upon values of
gains and losses (rather than final outcomes) and evaluate these quantities using
“heuristics”.
People make decisions by using heuristics. Is reasoning with heuristics irra-
tional? Well, first of all, rational or not, that is the way people do! Secondly,
rationality is understood (defined) in a mathematical way, e.g., via the notion of
On Quantum Probability Calculus for Modeling Economic Decisions 27

expected utility. Think about the analogy with Newtonian and quantum mechan-
ics. If particles move the way they do, are they irrational since they do not obey
Newton’s laws?
Perhaps a word about terminology is useful. In the field of Artificial Intelli-
gence, a fuzzy logic is not “a logic which is fuzzy”, but “it is a logic of fuzzy con-
cepts”. Here, by “quantum decision-making”, we mean “decision-making based
upon quantum probability”, where “quantum probability”, as Richard Feyn-
man pointed out, is a probability calculus similar to the one used in quantum
mechanics (nonadditive and noncommutative). The concept of “free will” of peo-
ple is used by analogy with the intrinsic randomness of particle motion, and not
any physical analogies. To predict human behavior (in making decisions under
uncertainty), academics use probabilities. For each probability calculus, we have
a defined notion of “expected utility” for rationality!
In view of various violations of expected utility (as a rational way to make
decisions), behavioral decision theory was developed as an integration of psy-
chology and economics. Technically speaking, it is about developing appropriate
“probability calculus” in a given environment. In general, it “looks” like people
use “likelihood” to make their decisions under uncertainty. But likelihood is not
probability! Recall that the concept of likelihood was formulated by Fisher in his
theory of statistical estimation of population parameters. Tou counter Bayesian
approach to statistical estimation, Fisher “talked” about turning likelihood con-
cept into his “fiducial probability” (probability based on faith) with no success!
But psychologists performed experiments revealing that likelihood is in fact
not only nonadditive, but also not non monotone increasing (as set functions),
so that all above nonadditive probability calculi seem not to be adequate for
modeling the way people make decisions.
The so-called “conjunction fallacy” in the literature [19] is this. A lady named
Linda was known as an active feminist in the past. Consider now the event A =
“She is active in the feminist movement”, and B = “She is a bank teller”.
Subjects are asked to guess the likelihoods of A, B, A ∩ B. It turns out that
subjects judged A ∩ B is more likely than B.
Another more important evidence is the so-called “order effect” [4] exhibiting
the noncommutativity of events, and hence affecting the probability calculi in
standard approaches.
Perhaps, this “order effect”, when putting in the context of non-Boolean
logics, calls for a radical way of thinking about axioms of probability measures,
i.e., looking for a probability calculus consistent with cognitive behavior. Clearly,
on top of all, it boils down to construct a new probability calculus which is
noncommutative (and nonadditive), and yet, it is a generalization of standard
probability calculus. Well, as we always borrow concepts and methods in physical
sciences to apply to social sciences, especially to economics, we have available a
probability calculus suitable for our needs, and that is called quantum probability
calculus. We elaborate on it a bit next.
28 H. T. Nguyen et al.

5 How to Construct a Noncommutative Probability


Calculus?
We wish to extend the standard probability calculus to a noncommutative one.
This type of generalization procedure is familiar in mathematics: if we cannot
extend a concept directly (e.g., a set to a fuzzy set), we do that indirectly,
namely look at some equivalent representation of that concept which can be
more suitable for extension.
Following David Hilbert’s advice “What is clear and easy to grasp attracts
us, complications deter”, let’s first consider the simplest case of Kolmogorov
probability, namely the finite sample space, representing a random experiment
with a finite number of possible outcomes, e.g., a roll of a pair of dice. A finite
probability space is a triple (Ω, A , P ) where Ω = {1, 2, ..., n}, say, i.e., a finite
set with cardinality n, A is the power set of Ω (events), and P : A → [0, 1]
is a probability measure (P (Ω) = 1, and P (A ∪ B) = P (A) + P (B) when
A ∩ B = ∅). Note that since Ω is finite, the set-function
n P is determined by
the density ρ : Ω → [0, 1], ρ(j) = P ({j}), with j=1 ρ(j) = 1. A real-valued
random variable is X : Ω → R. In this finite case, of course X −1 (B(R)) ⊆ A .
The domain of P is the σ-field A of subsets of Ω (events) which is Boolean
(commutative: A ∩ B = B ∩ A), i.e., events are commutative, with respect to
intersection of sets. We wish to generalize this setting to a non commutative one,
where “extended” events could be, in general, non commutative, with respect to
an “extension” of ∩.
For this, we need some appropriate equivalent representation for all elements
in this finite probability setting. Now since Ω = (1, 2, ..., n}, each function X :
Ω → R is identified as a point in the (finitely dimensional Hilbert) space Rn ,
namely (X(1), X(2), ..., X(n))t , which, in turn, is equivalent to a n × n diagonal
matrix with diagonal terms X(1), X(2), ..., X(n). and zero outside (a special
symmetric matrix), i.e.,
⎡ ⎤
X(1) 0
⎢ X(2) ⎥
⎢ ⎥
X ⇐⇒ [X] = ⎢ ⎢ 0 . 0 ⎥

⎣ 0. ⎦
0 X(n)

The set of such matrices is denoted as Do which is a commutative (with


respect to matrix multiplication) subalgebra of the algebra of all n × n matrices
with real entries. As matrices act as (bounded, linear) operators from Rn → Rn ,
we have transformed (equivalently) random variables into operators on a Hilbert
space.
In particular, for each event A ⊆ Ω, its indicator function 1A : Ω → {0, 1} is
identified as an element of Do with diagonal terms 1A (j) ∈ {0, 1}.As such, each
event A is identified as a (orthogonal) projection on Rn , i.e., an operator T such
that T = T 2 = T ∗ (its transpose/adjoint). Finally, the density ρ : Ω → [0, 1]
is identified with the element [ρ] of Do with nonnegative diagonal terms, and
On Quantum Probability Calculus for Modeling Economic Decisions 29

with trace tr([ρ]) = 1. An element of Do with nonnegative diagonal terms is a


positive operator, i.e., an operator T such that <T x, x> ≥ 0, for any x ∈ Rn
(where < ., . > denotes the scalar product of Rn ). Such an operator is necessarily
symmetric (self adjoint). Thus, a probability density is a positive operator with
unit trace. Thus, we have transformed the standard (Kolmogorov) probability
space (Ω, A , P ), with #(Ω) = n, into the triple (Rn , Po , ρ), where Po denotes
the subset of projections represented by elements of Do (i.e., with 0–1 diagonal
terms) which represent “ordinary” events; and ρ (or [ρ]), an element of Do , is a
positive operator with unit trace.
Now, keeping Rn as a finitely dimensional Hilbert space, we will proceed
to extend (Rn , Po , ρ) to a non commutative “probability space”. It suffices to
extend D0 , a special set of symmetric matrices, to the total set of all n × n
symmetric matrices, denoted as S (Rn ), so that a random variable becomes an
“observable”, i.e., a self-adjoint operator on Rn ; an “quantum event” is simply
an arbitrary projection on Rn , i.e., an element of P (the set of all projections);
and the probability density ρ becomes an arbitrary positive operator with unit
trace. The triple (Rn , P, ρ) is called a (finitely dimensional) quantum probability
space. We recognize that quantum probability is based upon a new language, not
real analysis, but functional analysis (i.e., not on the geometry of Rn , but on its
non commutative geometry, namely linear operators on it).
Clearly, in view of the non commutativity of matrix multiplication, quantum
events (i.e., projection operators) are non commutative, in general.
Let’s pursue a little further with this finite setting. When a random variable
X : Ω → R is represented by the matrix [X], its possible values are on the diag-
onal of [X], i.e., the range of X is σ([X]), the spectrum
 of the matrix (operator)
[X]. For A ⊆ Ω, Pr(A) is taken to be P ([1A ]) = j∈A ρ(j) = tr([ρ][1A ]). More
generally, EX = tr([ρ][X]), exhibiting the important fact that the concept of
“trace” (of matrix/operator) replaces integration, a fact which is essential when
considering an infinitely dimensional (complex, separable) Hilbert space, such as
L2 (R3 , B(R3 ), dx) of squared integrable, complex-valued functions.
The spectral measure of a random variable X, represented by [X], is the
projection-valued “measure” ζ[X] : B(R) → P(Rn ) : ζ[X] (B) = X(j)∈B πX(j) ,
where πX(j) is the (orthogonal) projection on the space spanned by X(j). From
it, the “quantum”  probability of the event (X ∈ B), for B ∈ B(R) is taken to
be P (X ∈ B) = X(j)∈B ρ(j) = tr([ρ]ζ[X] (B)).
The extension of the above to arbitrary (Ω, A , P ) essentially involves the
replacement of Rn by an infinitely dimensional, complex and separable Hilbert
space H. For details, see texts like Driac (1948), Meyer (1995), Parthasarathy
(1992).
We have stated several times that quantum probability is non commutative
and non additive. We will make these properties more explicit now.
Recall that a quantum probability space is a triple (H, P(H), ρ), where
P(H) plays the role of quantum events, and for p ∈ P(H), its probability
is given by tr(ρp). Recall that observables are self adjoint operators on H, i.e.,
elements of S (H).
30 H. T. Nguyen et al.

The probability measure μρ (.) = tr(ρ.) on P(H) is clearly non commutative


in general, since, for p, q ∈ P(H), they might not commute, i.e., pq = qp, so
that tr(ρpq) = tr(ρqp). Of course, that extends to non commuting observables
as well.
At the experiment level, the surprising non additivity of probability is
explained by the interpretation of the Schrodinger wave function ψ(x, t) as a
probability amplitude, i.e., the probability of finding an electron in a neigh-
borhood dx of R3 (at time t) is |ψ(x, t)|2 dx. The well-known two-slit experi-
ment reveals that, for two distinct holes A and B, the probability of finding
electrons when only A is open is PA = |ψA (x, t)|2 dx, and for B only open,
PB = |ψB (x, t)|2 dx. When both holes are open, waves interference leads to
ψA∪B (x, t) = ψA (x, t) + ψB (x, t), so that PA∪B = |ψA∪B (x, t)|2 = |ψA (x, t) +
ψB (x, t)|2 = |ψA (x, t)|2 + |ψB (x, t)|2 .
It can be also seen from the probability measure μρ (.) = tr(ρ.) on P(H).
First, P(H) is not a Boolean algebra. It is a non distributive lattice, instead.
Indeed, in view of the bijection between projections and closed subspaces of H,
we have, for p, q ∈ P(H), p ∧ q is taken to be the projection corresponding to
the closed subspace R(p) ∩ R(q), where R(p) denotes the range of p; p ∨ q is the
projection corresponding to the smallest closed subspace containing R(p)∪R(q).
You should check p ∧ (q ∨ r) = (p ∧ p) ∨ (p ∧ r), unless they commute.
On (H, P(H), ρ), the probability of the event p ∈ P(H) is μρ (p) = tr(ρp),
and if A ∈ S (H), Pr(A ∈ B) = μρ (ζA (B)) = tr(ρζA (B)), for B ∈ B(R), where
ζA is the spectral measure of A  (a projection-valued measure on B(R)). With
its spectral decomposition A = λ∈σ(A) λPλ , the distribution of A on σ(A) is
P r(A = λ) = μρ (ρPλ ), noting that A represents a physical quantity.
Recall that on a Kolmogorov probability space (Ω, A , P ), the probability is
axiomatized as satisfying the additivity: for any A, B ∈ A ;
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Now, on (H, P(H), ρ), where the “quantum probability” Q (under ρ), defined
as, for the “quantum event” p ∈ P(H), Q(p) = tr(ρp), does not, in general,
satisfy the analogue, for arbitrary p, q ∈ P(H),
Q(p ∨ q) = Q(p) + Q(q) − Q(p ∧ q)
i.e., Q(.) is not additive. This can be seen as follows. For operators f, g ∈ S (H),
their commutator is defined as
[f, g] = f g − gf
so that [f, g] = 0 if f, g do not commute (i.e., f g = gf ), and zero if they commute.
Then, you can check that
[p, q] = (p − q)(p ∨ q − p − q + p ∧ q)
exhibiting the equivalence
[p, q] = 0 ⇐⇒ p ∨ q − p − q + p ∧ q = 0
On Quantum Probability Calculus for Modeling Economic Decisions 31

i.e., non commutativity is equivalent to “non additivity” (of operators).


Now, as Q(p) = tr(ρp), and by additivity of the trace operator, we see that

p ∨ q − p − q + p ∧ q = 0 =⇒ tr(ρ(p ∨ q − p − q + p ∧ q)) =
Q(p ∨ q) − Q(p) − Q(q) + Q(p ∧ q) = 0

which is the analogue of additivity for the quantum probability Q, for example
for p, q which commute.
The non additivity of quantum probability arises since, in general, p, q ∈
P(H) do not commute, i.e., [p, q] = 0. In other words, the non additivity of
quantum probability is a consequence of the non commutativity of observables
(as self adjoint operators on a Hilbert space).
Remark. The quantum law of an observable A ∈ S (H) is given as μA (.) :
B(R) → [0, 1], μA (B) = tr(ρζA (B)), where ζA is the spectral measure of A.

6 Towards Quantum Decision and Economic Models


In a paper such as this, we think it is more appropriate to give the audience
a big picture of the promising road ahead, as far as financial econometrics is
concerned, rather than some details and algorithms on actually how to apply
“quantum-like models” (which should be the story in another day).
When data (including economic data) are available, we look at them just as
a sample of a dynamic process, i.e., just examining on how they fluctuated, and
not paying any attention on where they came from. In other words, when con-
ducting empirical research, regardless whether data are “natural phenomenon”
data or data having also some “cognitive” components (e.g., decisions from eco-
nomic agents/investors, traders in markets), we treat them the same way. Having
looked at data this way, we proceed (by tradition) simply by proposing stochas-
tic models to model their dynamics (for explanation and then prediction), such
as the well-known Black-Scholes model in financial econometrics. Clearly the
geometric Brownian motion model (describing the stochastic dynamics of asset
prices) captures randomness of natural phenomena, but does not incorporate
anything related to the effects of economic agents who are in fact responsible
for the fluctuations of the prices under consideration. As such, does a “tradi-
tional” stochastic model in econometrics really describe the dynamics on which
all conclusions will be derived?
Stephen Hawking nicely reminded us [17] that, following natural sciences
(i.e., physics), we should view economics (a social science) as an “effective the-
ory”, i.e., there is another important factor to take into account when proposing
a model (not a “law” yet!) for dynamics of economic variables, and that is deci-
sions of economic agents (“thinking individuals”, from the existence of their free
will). Whether or not, partially because of this that behavioral economics started
getting attention of researchers. Of course, the problem arises because, so far,
unlike, say, quantum mechanics, predictions in economics were not that success-
ful (!), as Hawking nicely qualified it as “moderate”. Should we ask “why?”.
32 H. T. Nguyen et al.

For example, financial econometrics is dominated by the so-called “efficient


market hypothesis” under the influence of P.A. Samuelson and E.F. Fama, which
is based upon the “assumption” that investors act rationally and without bias
(and new information appears at random, and influences economic prices at
random). As a consequence, using standard probability calculus, martingales are
models for dynamics of asset prices, resulting in the conclusion that “trading on
stock market is just a game of chance (luck) and not a game of skill”, despite
empirical evidence revealing that “stock dynamics is predictable to some degree”.
It is all about prediction. But prediction is a consequence of our modeling
process. Should we take a closer look at the way we used to model financial
dynamics? Obviously, we adapt (follow) concepts and methods in natural sci-
ences to social sciences, but not “completely”. The delicate difference between
Newtonian mechanics and quantum mechanics was ignored in econometrics mod-
eling. Of course, we do not “equate” the intrinsic randomness of particle motion
with the free will of economic agents’s mind (in making decisions). But, if, unlike
Newtonian mechanics, quantum mechanics is random so that, dynamics, trajec-
tories of particles should be formulated differently, then the same spirit should
be used in economic modeling.
But as Richard Feynman pointed out to us [11], when dealing with the ran-
domness of particles, we need another probability calculus. Of course that was
his only message to probabilists and statisticians, without knowing that later
standard probability and statistics invade empirical research in economics. The
quantum probability calculus seems strange (i.e., not applicable) to standard
statistical practices, because quantum probability exhibits “nonadditivity” and
“noncommutativity”. Well, Hawking did tell us that we have to pay attention
to psychologists because they are there precisely to help econometricians! Both
nonadditivity and noncommutativity of a measure of fluctuations were discov-
ered by psychologists, invalidating expect utility in the first place. The shift to
nonadditive measures (in human decision-making affecting economic data) has
been started long time ago, but it looks like a separate effort only for deci-
sion theory, with no incorporation into econometrics analysis. As pointed out in
this present paper, nonadditive measures, such as Choquet capacities, are not
adequate as a measure of fluctuations (of economic data) since they are still
increasing set functions, and commutative. It is right here that we should follow
physics “completely” by using quantum probability calculus in economic anal-
ysis. Recent literature shows promising research in this direction. Our hope, in
an exposition such as this, is that those econometricians who are not yet aware
of this revolutionary vision, will to start to consider it seriously.

References
1. Allais, M.: Le comportement de l’homme rationnel devant le risque: Critique des
postulats et axiomes de l’ecole americaine. Econometrica 21(4), 503–546 (1953)
2. Baaquie, B.E.: Quantum Finance. Cambridge University Press, Cambridge, New
York (2004)
On Quantum Probability Calculus for Modeling Economic Decisions 33

3. Briggs, W.: Uncertainty: The Soul of Modeling, Probability and Statistics.


Springer, New York (2016)
4. Busemeyer, J.R., Bruza, P.D.: Quantum Models of Cognition and Decision.
Cambridge University Press, Cambridge (2012)
5. Dempster, A.: Upper and lower probabilities induced by a multivalued mapping.
Ann. Math. Stat. 38, 325–339 (1967)
6. Denneberg, D.: Non-additive Measure and Integral. Kluwer Academic Press,
Dordrecht (1994)
7. Derman, D.: My Life as a Quant: Reflections on Physics and Finance. Wiley,
Hoboken (2004)
8. Diaconis, P., Skyrms, B.: Ten Great Ideas About Chance. Princeton University
Press, Princeton (2018)
9. Ellsberg, D.: Risk, ambiguity, and the savage axioms. Q. J. Econ. 75(4), 643–669
(1961)
10. Fegin, R., Halpern, J.Y.: Uncertainty, belief and probability. Comput. Intell. 7,
160–173 (1991)
11. Feynman, R.: The concept of probability in quantum mechanics. In: Berkeley Sym-
posium on Mathematical Statistics and Probability, pp. 533–541 (1951)
12. Fishburn, P.C.: Non Linear Preference and Utility Theory. Wheatsheaf Books,
Brighton (1988)
13. Fishburn, P.C.: Utility Theory for Decision Making. Wiley, New York (1970)
14. Gelman, A., Betancourt, M.: Does quantum uncertainty have a place in everyday
applied statistics? Behav. Brain Sci. 36(3), 285 (2013)
15. Gilboa, I., Marinacci, M.: Ambiguity and the Bayesian paradigm. In: Acemoglu,
D. (ed.) Advances in Economics and Econometrics, pp. 179–242. Cambridge Uni-
versity Press, Cambridge (2013)
16. Haven, E., Khrennikov, A.: Quantum Social Science. Cambridge University Press,
Cambridge (2013)
17. Hawking, S., Mlodinow, L.: The Grand Design. Bantam Books, London (2010)
18. Huber, P.J.: The use of Choquet capacities in statistics. Bull. Inst. Int. Stat. 4,
181–188 (1973)
19. Kahneman, D., Tversky, A.: Prospect theory: an analysis of decision under risk.
Econometrica 47, 263–292 (1979)
20. Kreps, D.M.: Notes on the Theory of Choice. Westview Press, Boulder (1988)
21. Lambertini, L.: John von Neumann between physics and economics: a methodolog-
ical note. Rev. Econ. Anal. 5, 177–189 (2013)
22. Marinacci, M., Montrucchio, L.: Introduction to the mathematics of ambiguity. In:
Gilboa, I. (ed.) Uncertainty in Economic Theory, pp. 46–107. Routledge, New York
(2004)
23. Meyer, P.A.: Quantum Probability for Probabilists. Lecture Notes in Mathematics.
Springer, Heidelberg (1995)
24. Nguyen, H.T.: On random sets and belief functions. J. Math. Anal. Appl. 65(3),
531–542 (1978)
25. Nguyen, H.T., Walker, A.E.: On decision making using belief functions. In: Yager,
R., Kacprzyk, J., Pedrizzi, M. (eds.) Advances the Dempster-Shafer Theory of
Evidence, pp. 331–330. Wiley, New York (1994)
26. Parthasarathy, K.R.: An Introduction to Quantum Stochastic Calculus. Springer,
Basel (1992)
27. Schmeidler, D.: Integral representation without additivity. Proc. Am. Math. Soc.
97, 255–261 (1986)
34 H. T. Nguyen et al.

28. Schmeidler, D.: Subjective probability and expected utility without additivity.
Econometrica 57(3), 571–587 (1989)
29. Segal, W., Segal, I.E.: The Black-Scholes pricing formula in the quantum context.
Proc. Nat. Acad. Sci. 95, 4072–4075 (1998)
30. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press,
Princeton (1976)
31. Sriboonchitta, S., Wong, W.K., Dhompongsa, S., Nguyen, H.T.: Stochastic Domi-
nance and Applications to Finance, Risk and Economics. Chapman and Hall/CRC
Press, Boca Raton (2010)
32. Von Neumann, J., Morgenstern, O.: The Theory of Games and Economic Behavior.
Princeton University Press, Princeton (1944)
33. Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall,
London (1991)
34. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. J. Fuzzy Sets Syst. 1,
3–28 (1978)
My Ban on Null Hypothesis Significance
Testing and Confidence Intervals

David Trafimow(&)

Department of Psychology, New Mexico State University, MSC 3452,


P. O. Box 30001, Las Cruces, NM 88003-8001, USA
dtrafimo@nsu.edu

Abstract. The journal, Basic and Applied Social Psychology, banned null
hypothesis significance testing and confidence intervals. Was this justified, and
if so, why? I address these questions with a focus on the different types of
assumptions that compose the models on which p-values and confidence
intervals are based. For the computation of p-values, in addition to problematic
model assumptions, there also is the problem that p-values confound the
implications of sample effect sizes and sample sizes. For the computation of
confidence intervals, in contrast to the justification that they provide valuable
information about the precision of the data, there is a triple confound involving
three types of precision. These are measurement precision, precision of homo-
geneity, and sampling precision. Because it is possible to estimate all three
separately, provided the researcher has tested the reliability of the dependent
variable, there is no reason to confound them via the computation of a confi-
dence interval. Thus, the ban is justified both with respect to null hypothesis
significance testing and confidence intervals.

Keywords: Null hypothesis significance testing  Confidence intervals


Models  Model assumptions  Inferential assumptions  Precision

In my new position as Executive Editor of the journal, Basic and Applied Social Psy-
chology (BASP) in 2014, I discouraged researchers from performing the null hypothesis
significance testing (NHST) procedure (Trafimow 2014). However, the 2014 editorial
had very little discernible effect on BASP submissions, social psychology, or science
more generally. Stronger measures were needed, so the following year I banned NHST
from BASP (Trafimow and Marks 2015). At first, most of the reaction I received was
strongly negative. Many people emailed me that the ban would destroy BASP, and a few
even expressed that the ban would destroy social psychology as a respectable area of
scientific inquiry. But did the critics exaggerate the negative effects of the ban?
As time eventually showed, the critics did not exaggerate the amount of attention
that would be paid to the editorial. Within a few months, there were over 100,000 hits
on the editorial on the BASP website, the editorial was cited countless times, and NHST
was an important topic at the American Statistical Association Symposium on Statis-
tical Inference in October of 2017, at which I presented. But in another way, the critics
did exaggerate or were just plain wrong. The ban certainly did not destroy BASP; in
fact, the impact factor more than doubled. Nor did the ban destroy social psychology as
a respectable area of scientific inquiry as, to my knowledge, no one has written an

© Springer Nature Switzerland AG 2019


V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 35–48, 2019.
https://doi.org/10.1007/978-3-030-04263-9_3
36 D. Trafimow

article suggesting that the ban is a reason for reducing belief in the respectability of
social psychology as a field of science. Much more damaging to the credibility of social
psychology, and the soft sciences more generally, is that there has been a replication
crisis (e.g., Earp and Trafimow 2015; Open Science Collaboration 2015; Trafimow, in
press).
On the contrary, much that is good has followed the ban. Dramatically
increaseddiscussion has ensued not only in social psychology, but in many areas of
science about the (in)validity of NHST and the possibility of alternative procedures.
Returning to the Symposium on Statistical Inference, most of the speakers strongly
criticized NHST and favored the consideration of alternative procedures. And there are
new efforts in different countries, in different areas of science, to eliminate NHST, such
as a Netherlands effort in the life and health sciences. In addition, statistics textbooks
now exist, either published or in the process of being published, that discourage
NHST.1 In contrast to the expressions of negativity that immediately followed the ban,
much is changing across the sciences, and very much for the better. Some of the
presentations at the present conference TES2019 provide positive examples.
What is NHST?
To understand what is wrong with NHST, it is first necessary to understand the p-value
and the place that the p-value has in NHST. The American Statistical Association
provided a nice characterization (Wasserstein and Lazar 2016, p. 131): “Informally, a
p-value is the probability under a specified statistical model that a statistical summary
of the data (e.g., the sample mean difference between two compared groups) would be
equal to or more extreme than its observed value” (italics added). The temptation, of
course, and what most of the experts warn against but do themselves anyhow, is make
an inverse inference that if the p-value is low, the probability of the model is low.2 The
fallacy is so pervasive that it has a name: the inverse inference fallacy. A quick way to
dramatize the fallacy is to consider the probability that a person is president of the USA
given that the person is an American citizen; versus the probability that someone is an
American citizen given that the person is president of the USA. The former conditional
probability is extremely low whereas the latter conditional probability is extremely high
(according to the American Constitution it is 1.00). Analogously, even though the p-
value might be low number, that does not mean that the statistical model need be
unlikely to be true. From the point of view of strict logic, the probability of the model
could be any number between 0 and 1, at least from a Bayesian perspective that models
can have probabilities. From a strict frequentist point of view, the model is either
correct (probability = 1) or incorrect (probability = 0) but we may not know which.
Thus, whether a Bayesian or frequentist view is taken, it would seem there is no
justification for drawing a conclusion about the probability of the model or making an
accept/reject decision, respectively. For a theoretical analysis of why the testing pro-
cedure using p-values is not valid, see Nguyen (2016).

1
An example is the book by Briggs (2016), who is a distinguished participant at TES2019.
2
Richard Morey, in his blog (http://bayesfactor.blogspot.com/2015/11/neyman-does-science-part-1.
html), has documented how even Neyman was unable to avoid misusing p-values in this way, though
he warned against it himself.
Ban on Null Hypothesis Significance Testing 37

But there is an important frequentist exception that invokes the notions of Type I
and Type II error. A Type I error is when the model is true and the researcher rejects it.
A Type II error is when the model is false and the researcher fails to reject it. The idea
of NHST, then, is to set an alpha level—usually .05—that serves as a threshold. If the
p-value comes in below the threshold, the researcher rejects the model whereas if the p-
value does not come in below the threshold, the researcher fails to reject the model
(which is not the same thing as accepting the model). By setting the alpha level, say at
.05, the researcher can be confident of making a Type I error only 5% of the time when
the model is true. This NHST strategy is touted as the way to sidestep the inverse
inference fallacy.
What is the Model?
If NHST provides an elegant way to sidestep the inverse inference fallacy, why am I
nevertheless against it? There is a long litany of reasons reviewed by Trafimow and
Earp (2017) that need not be repeated here. But it is worthwhile to hit on the most
interesting problem, the meaning of the model (Trafimow, submitted).
As exemplified in the foregoing quotation from the American Statistical Associa-
tion, although statisticians talk virtuously about the importance of recognizing that a
computed p-value is relative to a model, these same statisticians fail to consider ade-
quately the different types of assumptions that go into the model. As I explained
recently (Trafimow, submitted manuscript), the different types of assumptions that go
into the model change depending on the type of research being done. For research
designed to test theories, there are at least four categories of assumptions bullet-listed
below:
• theoretical assumptions,
• auxiliary assumptions,
• statistical assumptions, and
• inferential assumptions.
Theoretical assumptions refer to assumptions that the theorist makes by choosing a
theory or set of theories from which to work; the assumptions in the theory or set of
theories are theoretical assumptions. At least some theoretical assumptions refer to
nonobservational terms. For example, in Newton’s theory, although weight is an
observational term, mass is a nonobservational term; and the difference can be seen
easily merely by considering that the same object would have the same mass, but
different weights, on different planets.
Auxiliary assumptions are not embedded in theories themselves but are necessary
to connect nonobservational terms in theories with observational terms in empirical
hypotheses. For example, when Haley (1656–1742) used Newton’s theory to predict
the reappearance of the comet that now bears his name, he made assumptions about the
presence or absence of astronomical bodies, the present position of the comet, and so
on. These auxiliary assumptions were not in Newton’s theory but were necessary to use
the theory to make predictions.
Statistical assumptions concern the summary statistics researchers use. If a mar-
keting researcher were to predict greater purchasing intentions in the advertising
condition than in the control condition, she might propose a statistical hypothesis of a
38 D. Trafimow

difference in means. Note that she could use other summary statistics such as medians,
modes, data frequencies above or below arbitrary percentiles, and many others. The
choice of summary statistics necessitates assumptions, even if these are tacit, about
why the summary statistics chosen are better suited for the researcher’s purposes than
other types of summary statistics. Trafimow et al. (2018); also see Speelman and
McGann 2016) have shown that different summary statistics can lead to opposing
conclusions. Thus, the issue of choosing summary statistics is much more important
than researchers typically realize.
Finally, there are inferential assumptions. To arrive at a p-value, it is necessary to
assume random and independent sampling from a defined population (Berk and
Freedman 2003). Depending on one’s path to the p-value, it may also be necessary to
make assumptions about the population distributions, linearity, that participants were
randomly assigned to conditions, that the manipulation “took” for all participants, or
that there was no systematic invalidity in the measurements. I emphasize that at least
some inferential assumptions are guaranteed wrong in the soft sciences, such as the
assumption of random and independent sampling from a defined population (Berk and
Freedman 2003). The a priori wrongness of at least some inferential assumptions is
going to play an important part in the argument to be made.
In summary, the model is an immense monstrosity that includes theoretical
assumptions, auxiliary assumptions, statistical assumptions, and inferential assump-
tions. Theoretical assumptions include nonobservational terms, and auxiliary
assumptions connect nonobservational terms in theories to observational terms in
empirical hypotheses. Sometimes there is no need for statistical assumptions, such as
when Haley predicted the reappearance of a comet, but sometimes statistical
assumptions are necessary, as is typical in the soft sciences. Statistical assumptions
connect observational terms in empirical hypotheses to the specific summary statistics
to be computed. Finally, to bridge the gap from summary statistics to p-values, it is
necessary to make inferential assumptions.
Consequences of the Model Assumptions
The most obvious consequence of the many model assumptions, including inferential
assumptions, is that, at least in the soft sciences, the model is guaranteed to be wrong.
Box and Draper (1987) stated the issue succinctly, “Essentially, all models are wrong,
but some are useful” (p. 424). Well, then, given that the model is known wrong a priori,
it is reasonable to query whether there is any point in testing it via p-values. The usual
apology for p-values is to admit that the model is wrong a priori, but to argue, con-
sistent with the Box and Draper quotation, that it nevertheless might be close enough to
correct to be useful. Once one grants that the model might be close to being correct,
even though the model is incorrect, it seems reasonable to argue for the worth of
p-values to test model closeness to correctness.
But there is an important flaw in the p-value apology, which is that p-values are
computed conditionally upon the model being assumed correct, not upon the model
being assumed close to correct. The unfortunate fact of the matter is that there is no
way to compute a p-value based on the assumption that the model is close to correct.
As an example, it is obvious that the inferential assumption of random and independent
sampling from a defined population is never true in the soft sciences (Berk and
Ban on Null Hypothesis Significance Testing 39

Freedman 2003); yet this assumption is a component of the model. Well, then, how
would one compute a p-value based on the notion of various degrees of closeness of
this assumption, not to mention the myriad additional inferential assumptions? Let me
be clear that I am not advocating that researchers always should randomly sample from
defined populations, only that this is an inferential assumption that must be made to
justify the p-value to be computed.3 The larger point, which is worth reiterating, is that
there is no reason for researchers interested in model closeness to compute p-values,
nor to engage in NHST.
From a philosophy of science perspective, matters are worse than I have related
thus far. Consider luminaries such as Duhem (1954) and Lakatos (1978) who
emphasized that even if obtained data seem to falsify a theory, the failure to predict can
be attributed to auxiliary assumptions, as well as to the theory. Thus, it is not clear
whether to blame the theory or auxiliary assumptions for the empirical defeat for the
theory. In addition, I have recently demonstrated that a similar problem ensues in the
event of an empirical victory (Trafimow 2017a); the victory could be credited to the
excellence of the theory or the auxiliary assumptions. And this work by Duhem,
Lakatos, and myself only considers theoretical and auxiliary assumptions; but not
statistical or inferential assumptions that add additional layers of complexity to an
already challenging process of evaluating theories based on empirical defeats or vic-
tories. Considering statistical assumptions, the benefits typically outweigh the cost of
another layer of assumptions. Researchers in the soft sciences need means, medians,
percentiles, and so on; though they should put more thought into which descriptive
statistics to use, and to test whether using different descriptive statistics imply similar or
different conclusions (Trafimow et al. 2018).
But it is less clear that researchers need p-values. In the first place, as I will
demonstrate presently, unlike typical descriptive statistics, p-values provide no added
benefit. Secondly, and directly to the present point, using p-values necessitates a layer
of inferential assumptions, some of which are guaranteed wrong. This guaranteed
wrongness implies that no matter how carefully the research project is performed, the
obtained p-value can be attributed to a problem with the inferential assumptions, as
well as a problem with one of the other types of assumptions. The result is a severe
impediment to attempting to draw conclusions about theories from data. Thus, the cost
of using p-values, though often not recognized, is immense. Further costs, particularly
when p-values are used for NHST, are dichotomous thinking (Greenland 2017;
Trafimow et al. 2018); questionable research practices to obtain p < .05 for the sake of
grants and publications (Bakker et al. 2012; John et al. 2012; Simmons et al. 2011;
Woodside 2016); published effect sizes that, on average, are much larger than true
effect sizes (Grice, 2017; Hyman, 2017; Kline, 2017; Locascio, 2017a; Locascio,
2017b; Marks, 2017; Open Science Collaboration 2015); replication crises (Earp and
Trafimow 2015; Halsey et al. 2015; Open Science Collaboration 2015; Trafimow, in
press), and other costs (see Briggs 2016; Hubbard 2016; Trafimow and Earp 2017;
Ziliak and McCloskey 2016; for reviews).

3
In fact, Rothman et al. (2013) provided arguments against random selection.
40 D. Trafimow

Do p-values convey any benefits to balance out the immense costs of using them?
Consider an example. A researcher obtains a large sample size, obtains a small p-value,
and rejects the model. Given that the researcher already knew the model was incorrect
before starting, there is no new information here. A counter might be to invoke the
closeness apology; that is, the low p-value shows that the model is not even close to
being true. But this conclusion does not follow validly. First, it commits the inverse
inference fallacy discussed earlier. Second, the p-value was not computed based on the
model being close; but rather based on the model being exactly right. Therefore, the p-
value has nothing to say about whether the model is close or far from the truth. Third,
again from the point of view of the closeness of the model to the truth, the p-value
confounds the implications of sample effect size and sample size. To see this quickly,
suppose that the researcher in our example had proposed a model that specifies the
predicted effect size and that the obtained effect size is close to it. The low p-value is
due to the large sample size. Well, then, the low p-value implies that the model is far
from the truth whereas the closeness of the sample effect size to the predicted one
implies that the model is close to the truth. Clearly, an intelligent researcher would
discount the p-value; not the sample effect size; though she would also keep in mind the
usual difficulties in drawing conclusions about theories from data. Although, in this
example, I specified a large sample size for the sake of simplicity, there is a larger
point. That is, the sample effect size is obviously relevant to assessing the closeness of
the model to the truth (though other factors are relevant too); and the sample size is
obviously relevant for assessing the extent to which the researcher should trust the
stability of the sample effect size (though other factors are relevant too). Given that the
researcher knows the sample effect size and the sample size, the p-value provides no
added benefit. Worse yet, the p-value confounds the implications of the sample effect
size and the sample size. If the issue were the exact correctness of the model, perhaps
the confounding could be justified, though there would still be the inverse inference
fallacy with which to contend. But given that the issue is the closeness of the model,
there is no justification whatsoever in confounding the clear implications of the sample
effect size and the sample size, when not confounded with each other via the com-
putation of a p-value. In summary, the combination of no added benefit and immense
costs renders p-values strongly deleterious to the progress of science.
What About Research Not Designed to Test Theories?
One alternative to research designed to test theories is that researchers may wish to
perform applied research. For those researchers applying a theory, all the categories of
assumptions mentioned earlier—theoretical, auxiliary, statistical, and inferential—re-
main relevant. But it also is possible to perform applied research that is not based on
theory but depends on what might be considered substantive assumptions. However,
this change does not affect the message of the foregoing section. That is, there may be
substantive assumptions and statistical assumptions; whereas using p-values forces the
added complexity of including inferential assumptions too. Thus, even for applied
research, p-values render an already challenging task of evaluating substantive
assumptions more complex, by adding a layer of known wrong inferential assumptions.
It also is possible to imagine researchers using p-values for data fishing. But if one
is data fishing, it makes more sense to decide on relevant sample effect sizes for the
Ban on Null Hypothesis Significance Testing 41

issue at hand, and screen based on those rather than based on p-values. Remaining with
data fishing, it is easy to imagine researchers having the computer perform two com-
puter runs. Whereas one run is based on sample effect sizes, the other run could be
based on sample sizes, where the researcher programs the run on the minimum sample
size necessary for her to trust the stability of the sample effect size. Even here, whatever
assumptions the researcher needs to make to go data fishing, there is nothing gained by
adding an extra layer of inferential assumptions where at least some of them are known
wrong a priori.
Finally, let us consider exploratory research, where very little is known, and the
purpose of the research is to obtain empirical facts on which to base decisions about
whether or how to continue the line of research. Perhaps NHST can be useful here.
That is, if the p-value is statistically significant, that suggests that the researcher should
continue the line of research whereas if the p-value is not statistically significant, that
suggests that resources would be better devoted to pursuing an alternative line of
research. The main problem with this thinking is that the p-value again is no better than
the model on which it is based. We already have seen that the model is problematic in
the context of research performed to test theories, for application, and for data fishing.
The model is even more problematic for exploratory research where very little is
known and so there is even less basis for the model assumptions than usual. In addition
to the ubiquitous and wrong inferential assumption of random and independent sam-
pling from a defined population; when very little is known there is even less justifi-
cation than usual for inferential assumptions about distributions, linearity, and so on.
The sample effect size provides a much better indication of whether to continue the line
of investigation, and the sample size provides an indication of how much to trust the
stability of the sample effect size. The p-value provides no added benefit, confounds the
implications of sample effect size and sample size, and saddles researchers with
important costs.4

1 Confidence Intervals: The Usual Alternative to NHST

The usual recommendation by authorities who eschew NHST specifically, and p-values
more generally, is that researchers should use confidence intervals (see Cumming and
Calin-Jageman 2017 for a review). Contrary to this recommendation, the BASP p-value
ban included confidence intervals (CIs) too. Why?
To commence, it is worth considering what CIs do not accomplish. Researchers
often believe, for example, that a 95% CI provides an interval that has a 95% proba-
bility of containing the population parameter of interest. But this is not so. As
sophisticated aficionados of CIs admit, the researcher has no idea about the probability

4
The reader may wonder about p-values as used in NHST versus as used to provide continuous
indices of alleged justified worry about the model. Although both are problematic for the reasons
described, null hypothesis significance tests are worse because of the dichotomous thinking they
encourage, and the dramatic overestimates of effect sizes in scientific literatures that they promote
(see Locascio, 2017a for an explanation). If p-values were calculated but not used to draw any
conclusions, their costs would be reduced though still without providing any added benefits.
42 D. Trafimow

that the constructed interval contains the population parameter. In fact, from a strict
frequentist perspective, the population parameter is in the interval (100% probability)
or not (0% probability), though the researcher does not know which. Even if one does
not take a strict frequentist perspective, the best that can be said about a 95% CI is that
if one were to take many samples, approximately 95% of the 95% CIs constructed,
would contain the population parameter.5 To conclude anything about the probability
that population parameter is within a CI constructed based on a single sample is to
commit another version of the inverse inference fallacy discussed earlier.
Sophisticated CI aficionados understand the foregoing but nevertheless tout CIs
because they indicate the precision of the data. Wide CIs indicate less precision and
narrow CIs indicate more precision. Are CIs useful in a precision context?
In fact, they are not, as becomes clear upon consideration that there are at least three
types of precision, and all of them influence CI widths. It is well-known that the
researcher needs to obtain the standard error of the mean to compute a confidence
interval for a mean or a difference between means. The standard error of the mean is
influenced by the standard deviation and the sample size. In turn, the standard deviation
is influenced by random and systematic variance. According to classical test theory
(Gulliksen 1987; Lord and Novick 1968), the smaller the random variance, the greater
the measurement precision. In addition, the smaller the systematic variance (e.g., real
differences between people) within each group, the easier it is to distinguish the effects
of experimental manipulations in between-group analyses; Trafimow (2018) termed
this precision of homogeneity. Finally, the larger the sample size, the greater the
sampling precision.6 Thus, CI widths are influenced by three types of precision:
measurement precision, precision of homogeneity, and sampling precision. This triple
confounding will be discussed more later.
Trafimow (2018) derived equations that unconfound the joint effects of the three
types of precision, provided that the researcher obtains a good estimate of the reliability
of the dependent variable. At the population level, Eq. 1 gives the variance due to
systematic factors (true differences between participants) r2O as a function of the reli-
ability of the dependent variable qXX 0 and the total variance r2X . Equation 2 gives the
variance due to randomness r2R . As researchers normally do not have access to pop-
ulation data and consequently must depend on sample data, Eqs. 1* and 2* perform
similar functions to Eqs. 1 and 2, respectively; but based on sample reliability rXX 0 and
sample variance s2X .

r2O ¼ qXX 0 r2X ð1Þ

r2R ¼ r2X  qXX 0 r2X ð2Þ

estimated r2O ¼ rXX 0 s2X ð1Þ

5
Of course, even this very limited conclusion depends on the model being correct, and as we already
have seen, the model is not correct because of problematic inferential assumptions.
6
Assuming random sampling, an assumption most likely incorrect.
Ban on Null Hypothesis Significance Testing 43

estimated r2R ¼ s2X  rXX 0 s2X : ð2Þ

In summary, as r2O decreases and r2R decreases, there is greater precision of


homogeneity and greater measurement precision, respectively.
In contrast to precision of homogeneity and measurement precision, the third type
of precision—sampling precision—is in no way a function of the reliability of the
dependent variable. Rather, it is a function of the sample size. As a simple example,
consider the case where a researcher collects a sample mean with the goal of estimating
the population mean. The sampling precision is simply a function of the sample size n.
To see that this is so, imagine the simple case of only one group, and the goal is to
obtain a sample mean to estimate the population mean. Trafimow (2017) provided an
accessible derivation of Eq. 3 below, where f is the fraction of a standard deviation
within which the researcher wishes the sample mean to be of the population mean, and
where zc is the z-score that corresponds to the desired probability of being within f .
 2
zc
n¼ : ð3Þ
f

For example, suppose the researcher wishes to be 95% confident of obtaining a


sample mean within 3 standard deviations of the population mean. The z-score cor-
responding to 95% is 1.96, and so instantiating the appropriate values into Eq. 3 gives
 2
the following, to the nearest upper whole number: n ¼ 1:96 :3 ¼ 43. Thus, the
researcher must collect at least 43 participants to have a 95% probability of obtaining a
sample mean within three-tenths of a standard deviation of the population mean.
I claimed earlier that using CIs causes a confound involving the implications of the
three types of precision. The easiest way to see this is to consider an Equation Trafi-
mow (2018) derived that gives the standard error of the mean as a function of the three
types of precision indices: rO , rR , and n.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r2o þ r2R
Standard Error of the Mean ¼ SDX ¼ pffiffiffi : ð4Þ
n

Equation 4 shows that very many possible combinations of values for rO , rR , and n
can result in similar results for SDX . Thus, whatever a researcher estimates SDX to be,
there is no way for the researcher to know the extent to which the obtained value is
because of impressive or unimpressive levels of each of the three types of precision. To
render the triple confound easy to see, Fig. 1 illustrates how many values for the
different levels of precision can result in the same value for SDX , which was arbitrarily
set at 10 for every curve in Fig. 1.
The fact that it is possible to obtain separate estimates of the three types of precision
ðrO ; rR ; nÞ, rather than confounding their implications by using a CI, sets up a dilemma
for CI aficionados who tout the precision argument as their justification for using CIs.
Specifically, if a researcher cares about precision, there is little excuse for failing to
measure the reliability of the dependent variables, and using the appropriate equations
44 D. Trafimow

Fig. 1. The sampling precision n necessary to keep the standard error of the mean constant at 10
is presented along the vertical axis as a function of measurement precision rR presented along the
horizontal axis and six curves representing different levels of precision of homogeneity rO .

to estimate the three types of precision separately. In this case, there is no reason to
compute a triply confounded confidence interval when the researcher can have much
more fine-grained information in the form of separate estimates of the three types of
precision. Alternatively, the researcher might not care about precision, and might not
measure the reliability of the dependent variable; but in the case where the researcher
does not care about precision, the precision argument fails to justify a CI. Thus, in
either case, whether the researcher cares about precision or does not care about it, there
is little reason to compute a CI.
My case against CIs can be summarized as follows. If they are used merely as
another way to perform NHST; that is, to reject the null hypothesis if the value
specified by the null hypothesis is outside the computed CI; then CIs have the same
problems that plague NHST. If CIs are used for the sake of precision—the “proper” use
according to sophisticated authorities—then there is the issue of the confounding of
three different types of precision in the standard error of the mean necessary to compute
CIs. Either way, CIs are problematic. Finally, as is true of p-values, the computation of
CIs necessitates appending a layer of inferential assumptions—some of which are
guaranteed to be wrong—on top of the other assumptions made, thereby further
complicating an already difficult task of using data to come to justifiable conclusions.
Ban on Null Hypothesis Significance Testing 45

2 Conclusion

It is possible to agree with the many problems that plague NHST and CIs but never-
theless support them based on a perceived lack of alternatives. Contrary to many
researchers’ perceptions, however, there are many alternatives. An underrated alter-
native, but a good one, is to simply not perform inferential statistics. Lest the reader
experience dismay, or even panic, at this possibility; consider that for centuries before
NHST and CIs had been invented, researchers nevertheless conducted science. Is this
because scientists of yesteryear were smarter than scientists of today, and so they made
progress without the typical inferential statistics currently prevalent and deemed nec-
essary? My argument is to the contrary. Because NHST and CIs force the researcher to
append a layer of inferential assumptions—some of which are guaranteed to be wrong
—on top of their other assumptions, researchers of yesteryear who were not burdened
with this layering had an important advantage over contemporary researchers.7 By
dispensing with NHST and CIs, contemporary researchers could rid themselves of the
unnecessary burden, and accordingly enjoy the benefits of a simpler and less cum-
bersome scientific process. As a supplement to simply avoiding NHST and CIs, there is
a variety of ways to use visual displays to enhance the ability of researchers to
understand their descriptive statistics and draw useful conclusions (Valentine et al.
2015). As editor of BASP, I have supported improved descriptive statistics and visual
displays of the data.
Another possibility is to consider completely different approaches. One of these is
quantum probability (Trueblood and Busemeyer 2011; 2012) that includes an
assumption that the typically assumed communitive nature of events can be violated.
That is, the typical assumption is that pðA \ BjH Þ ¼ pðB \ AjHÞ. By applying Bayes’
theorem, a further implication is that pðHjA \ BÞ ¼ pðHjB \ AÞ. Based on quantum
probability not assuming communitive properties, Trueblood and Busemeyer (2012)
showed that it is possible to account for much data that otherwise is difficult to explain.
Yet another possibility is for researchers to use one of the a priori procedures that
have been invented only recently. The basic concept commences with the goal that
many researchers have, which is to obtain sample statistics that are good estimates of
corresponding population parameters. With this goal in mind, two obvious issues
concern the desired closeness of sample statistics to population parameters, and the
probabilities of being within the desired distances. The researcher can specify, before
data collection, specifications for distances and probabilities. By using one of a
growing inventory of a priori equations to suit different study designs, different
inferential assumptions, and different statistics (e.g., Trafimow 2017b; Trafimow and
MacDonald 2017; Trafimow, Wang, and Wang 2017; Wang, Wang, Trafimow, and
Myüz, under submission); the researcher can determine the number of participants
needed to be confident that descriptive statistics of interest are good estimates of their
corresponding population parameters. Because a priori equations are used prior to data

7
This argument should not be interpreted as indicating that contemporary researchers are at an overall
disadvantage. In fact, contemporary researchers have many advantages over the researchers of
yesteryear, including better knowledge, better technology, and others.
46 D. Trafimow

collection, there is no necessity to perform NHST or CIs after data collection. Rather,
the researcher who uses the a priori procedure simply performs the relevant calcula-
tions, collects the sample size indicated, and then trusts that the descriptive statistics to
be obtained are good estimates of the corresponding population parameters, based on a
priori specifications.8
Finally, of course, there are Bayesian procedures. A proper discussion of Bayesian
procedures is far beyond present scope, but the reader can gain an appreciation of the
different varieties of Bayesian thinking, along with philosophical problems associated
with each, by consulting Gillies (2000).
Clearly, then, there are alternatives to NHST and CIs, though each of the alter-
natives may have their own issues. Although the alternatives still need more consid-
eration before their unqualified acceptance, they are sufficient, at least, to counter the
perception that there are no alternatives. I reiterate that there is always the option of
simply not using inferential procedures of any type which; even if the reader rejects,
quantum probability, a priori procedures, and Bayesian procedures; is superior to
NHST and CIs. My hope and expectation is that TES2019 will help fulfill the function
of seeing out NHST and CIs.

References
Bakker, M., van Dijk, A., Wicherts, J.M.: The rules of the game called psychological science.
Perspect. Psychol. Sci. 7(6), 543–554 (2012)
Berk, R.A., Freedman, D.A.: Statistical assumptions as empirical commitments. In: Blomberg, T.
G., Cohen, S. (eds.) Law, Punishment, and Social Control: Essays in Honor of Sheldon
Messinger. 2nd edn., pp. 235–254. Aldine de Gruyter (2003)
Box, G.E.P., Draper, N.R.: Empirical Model-Building and Response Surfaces. Wiley, New York
(1987)
Briggs, W.: Uncertainty: The Soul of Modeling, Probability and Statistics. Springer, New York
(2016)
Cumming, G., Calin-Jageman, R.: Introduction to the New Statistics: Estimation, Open Science,
and Beyond. Taylor and Francis Group, New York (2017)
Duhem, P.: The Aim and Structure of Physical Theory (P.P. Wiener, Trans). Princeton University
Press, Princeton (1954). (Original work published 1906)
Earp, B.D., Trafimow, D.: Replication, falsification, and the crisis of confidence in social
psychology. Front. Psychol. 6, 1–11, Article 621 (2015)
Gillies, D.: Philosophical Theories of Probability. Routledge, London (2000)
Greenland, S.: Invited commentary: the need for cognitive science in methodology. Am.
J. Epidemiol. 186, 639–645 (2017)
Gulliksen, H.: Theory of Mental Tests. Lawrence Erlbaum Associates Publishers, Hillsdale
(1987)
Halsey, L.G., Curran-Everett, D., Vowler, S.L., Drummond, G.B.: The fickle P value generates
irreproducible results. Nat. Methods 12, 179–185 (2015). https://doi.org/10.1038/nmeth.3288

8
This sparse description may seem to imply that a priori procedures are simply another way to
perform power analyses. However, this is not true, and I have provided demonstrations of the
differences, including contradictory effects (Trafimow 2017b; Trafimow and MacDonald, 2017).
Ban on Null Hypothesis Significance Testing 47

Hubbard, R.: Corrupt Research: The Case for Reconceptualizing Empirical Management and
Social Science. Sage Publications, Los Angeles (2016)
John, L.K., Loewenstein, G., Prelec, D.: Measuring the prevalence of questionable research
practices with incentives for truth telling. Psychol. Sci. 23(5), 524–532 (2012)
Lakatos, I.: The Methodology of Scientific Research Programmes. Cambridge University Press,
Cambridge (1978)
Lord, F.M., Novick, M.R.: Statistical Theories of Mental Test Scores. Addison-Wesley, Reading
(1968)
Nguyen, H.T.: On evidential measures of support for reasoning with integrated uncertainty: a
lesson from the ban of P-values in statistical inference. In: Huynh, V.N., et al., (eds.)
Integrated Uncertainty in Knowledge Modeling and Decision Making. Lecture Notes in
Artificial Intelligence, vol. 9978, pp. 3–15. Springer (2016)
Open Science Collaboration. Estimating the reproducibility of psychological science. Science
349(6251), aac4716 (2015). 10.1126/science.aac4716
Rothman, K.J., Galacher, J.E.J., Hatch, E.E.: Why representativeness should be avoided. Int.
J. Epidemiol. 42(4), 1012–1014 (2013)
Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility
in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22(11),
1359–1366 (2011)
Speelman, C.P., McGann, M.: Editorial: challenges to mean-based analysis in psychology: the
contrast between individual people and general science. Front. Psychol. 7, 1234 (2016)
Trafimow, D.: Editorial. Basic Appl. Soc. Psychol. 36(1), 1–2 (2014)
Trafimow, D.: Implications of an initial empirical victory for the truth of the theory and
additional empirical victories. Philos. Psychol. 30(4), 411–433 (2017a)
Trafimow, D.: Using the coefficient of confidence to make the philosophical switch from a
posteriori to a priori inferential statistics. Educ. Psychol. Meas. 77(5), 831–854 (2017b)
Trafimow, D.: An a priori solution to the replication crisis. Philos. Psychol. 31, 1188–1214
(2018)
Trafimow, D.: A taxonomy of model assumptions on which P is based and implications for added
benefit in the soft sciences (under submission)
Trafimow, D., Amrhein, V., Areshenkoff, C.N., Barrera-Causil, C.J., Beh, E.J., Bilgiç, Y.K.,
Bono, R., Bradley, M.T., Briggs, W.M., Cepeda-Freyre, H.A., Chaigneau, S.E., Ciocca, D.R.,
Correa, J.C., Cousineau, D., de Boer, M.R., Dhar, S.S., Dolgov, I., Gómez-Benito, J.,
Grendar, M., Grice, J.W., Guerrero-Gimenez, M.E., Gutiérrez, A., Huedo-Medina, T.B., Jaffe,
K., Janyan, A., Karimnezhad, A., Korner-Nievergelt, F., Kosugi, K., Lachmair, M., Ledesma,
R.D., Limongi, R., Liuzza, M.T., Lombardo, R., Marks, M.J., Meinlschmidt, G., Nalborczyk,
L., Nguyen, H.T., Ospina, R., Perezgonzalez, J.D., Pfister, R., Rahona, J.J., Rodríguez-
Medina, D.A., Romão, X., Ruiz-Fernández, S., Suarez, I., Tegethoff, M., Tejo, M., van de
Schoot, R., Vankov, I.I., Velasco-Forero, S., Wang, T., Yamada, Y., Zoppino, F.C.M.,
Marmolejo-Ramos, F.: Manipulating the alpha level cannot cure significance testing. Front.
Psychology. 9, 699 (2018)
Trafimow, D., MacDonald, J.A.: Performing inferential statistics prior to data collection. Educ.
Psychol. Meas. 77(2), 204–219 (2017)
Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 37(1), 1–2 (2015)
Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 38(1), 1–2 (2016)
Trafimow, D., Wang, T., Wang, C.: Means and standard deviations, or locations and scales? That
is the question! New Ideas Psychol. 50, 34–37 (2018)
Trafimow, D., Wang, T., Wang, C.: From a sampling precision perspective, skewness is a friend
and not an enemy! Educ. Psychol. Meas. (in press)
48 D. Trafimow

Trueblood, J.S., Busemeyer, J.R.: A quantum probability account of order effects in inference.
Cogn. Sci. 35, 1518–1552 (2011)
Trueblood, J.S., Busemeyer, J.R.: A quantum probability model of causal reasoning. Front.
Psychol. 3, 138 (2012)
Valentine, J.C., Aloe, A.M., Lau, T.S.: Life after NHST: how to describe your data without “p-
ing” everywhere. Basic Appl. Soc. Psychol. 37(5), 260–273 (2015)
Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p-values: context, process, and purpose.
Am. Stat. 70, 129–133 (2016)
Woodside, A.: The good practices manifesto: overcoming bad practices pervasive in current
research in business. J. Bus. Res. 69(2), 365–381 (2016)
Ziliak, S.T., McCloskey, D.N.: The Cult of Statistical Significance: How the Standard Error
Costs us Jobs, Justice, and Lives. The University of Michigan Press, Ann Arbor (2016)
Kalman Filter and Structural Change
Revisited: An Application to Foreign
Trade-Economic Growth Nexus

Omorogbe Joseph Asemota(B)

Department of Research and Training, Macroeconomic Unit, National Institute for


Legislative and Democratic Studies, Abuja, Nigeria
asemotaomos@yahoo.com,
asemota@econ.kyushu-u.ac.jp,omorogbe.asemota@nils.gov.ng

Abstract. In the last two decades, Nigeria’s economy has been affected
by several shocks such as the 1985-86 oil price crash; 1997 Asian finan-
cial crisis; 2008-2009 global financial crisis, oil price crash that started
in 2014 as well as political uncertainties in the country. Parameters of
econometric models are dependent on prevailing policy and will react to
policy changes. Yet, previous researches on trade-growth modelling in
Nigeria had assumed parameter constancy over time. Thus, this paper
constructed a time varying parameter model for trade-growth nexus and
demonstrated how the model can be useful in the detection of structural
breaks and outliers. The paper first demonstrated via the rolling regres-
sion method that the parameters have been time dependent and pro-
ceeded with the Kalman filter to estimate the transition of the changing
parameters of the trade-growth nexus. Thereafter, it presented applica-
tions to show how the auxiliary residuals of the model can be used to
detect the time of structural breaks and outliers.

Keywords: Exports · Imports · Kalman filter · Smoother · Outliers


State space model · Structural breaks · Time-vary coefficients

1 Introduction

The notion of structural change has received considerable attention in the econo-
metric literature in recent times. This is premised on various theories of stages
of economic development and growth which assumes that economic relationship
changes over time. Initially, such changes were explained in descriptive form.
However, with the introduction of regression analysis as the principal tool of
economic data processing in the 1950s and 1960s, statisticians and econome-
tricians started to describe structural changes in macroeconomics in regression
framework (Asemota and Saeki 2012). Structural change is defined as a change in

I thank Tom Doan of ESTIMA for his comments when writing the RATS codes used
in the analysis.
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 49–62, 2019.
https://doi.org/10.1007/978-3-030-04263-9_4
50 O. J. Asemota

one or more of the parameters of an econometric model, and usually the “change
point” or “break point” is often unknown. Hansen (2001) posited that an unde-
tected structural change or break can lead to three major problems in time
series analysis: inferences can be wrong; forecasts can be inaccurate; and pol-
icy recommendations may be misleading. The econometrics of structural change
looks for systematic methods of identifying, estimating, testing and monitoring
of structural breaks.
Kim and Siegmund (1989) proposed a test to detect structural change in a
simple linear regression against two different alternatives: one specifies that only
the intercept changes while the second permits the intercept and the slope to
change. Bai and Perron (1998) tested for multiple structural changes occurring at
unknown dates in a linear regression model estimated by ordinary least squares.
Their approach was based on testing for partial structural change model where
all parameters are not subjected to shifts as against pure structural change model
where all coefficients are subjected to change. Xu and Perron (2017) noted that
forecasting models are subject to instabilities, leading to imprecise and unreli-
able forecasts. Thus, they proposed a frequentist-based approach to forecast time
series in the presence of in-sample and out-of-sample breaks in the parameters
of the model. In Xu and Perron (2017) model, the parameters evolved through a
random level shift process, with the occurrence of a shift governed by a Bernoulli
process. Hauwe et al. (2011) and Maheu and Gordon (2008) proposed a Bayesian
approach to modelling structural breaks in time series models. The strength of
the Bayesian approach is that it uses the prior distribution to treat the param-
eters as random and allow the breaks to occur with some probability. However,
the Bayesian approach is sensitive to the exact prior distributions used (Xu and
Perron 2017). Harvey (1981) demonstrated the use of the Kalman filter for
obtaining maximum likelihood estimates of parameters through prediction error
decomposition. From Harvey’s work, it became clear that a wide range of
econometric models, including regression models with time-varying coefficients,
autoregressive moving average (ARMA) models, and unobserved-components
time series models could be cast in state space form. Asemota (2016) noted that
despite the differences in the state space and ARIMA modelling strategies, the
two models can be considered equivalent, however, the local level state space
model clearly outperformed the ARIMA model at all forecasting horizons con-
sidered in their analysis.
While the state space form allows unobserved components to be incorpo-
rated into a model, the Kalman filter algorithm is used in the model estimation.
Theodosiadou et al. (2017) applied the recursive Kalman filter algorithm to
detect change points in NASDAQ daily returns and three of its stocks where the
jumps are assumed to be hidden random variables. Hamilton (1989) modelled
the instability of parameters using the Markov-switching models and Perron
(1989) incorporated structural breaks into the unit root tests. Ito et al. (2017)
noted that the time-varying parameter models are flexible enough to capture
the complex nature of a macroeconomic system, thus yielding better forecasts
and a better fit to data than models with constant parameters. They proposed
Kalman Filter and Structural Change Revisited 51

a non-Bayesian, regression-based or generalized least squares (GLS)-based app-


roach to estimate a class of time-varying AR parameter models. Ito et al. (2017)
demonstrated that their approach yielded smoothed estimate that is identical to
the Kalman-smoothed estimate. Asemota (2012) modeled the inflow of tourists
to Japan using time varying parameter model and evaluated the forecasting
performance of the model.
Economic theory indicates that a definite relationship exists between a coun-
try’s trade flows and its current account (Sunanda 2010). Furthermore, this
relationship can be extended to the theory that increased exports positively
contribute to the economy of a country or region. In Nigeria, the importance
of trade to the national economy cannot be overemphasized. According to the
National Bureau of Statistics (NBS, 2018) trade contributed 16.86% to Nige-
ria’s GDP at the end of 2017. This affirms that trade is an important sec-
tor of the Nigerian economy. Trade creates economic opportunities for people,
income opportunities, job creation and improvement in the general standard of
living. Hence, several researchers such as Oluchukwu and Ogochukwu (2015),
Arodoye and Iyoha (2014), Omoke and Ugwuanyi (2010), Iyoha (1998), Ekpo
and Egwaikhide (1994) have examined the trade-growth nexus in Nigeria.
However, their methodology assumed constancy of parameters over time.
The assumption of the constant parameter model may not be appropriate for
the foreign trade-growth nexus since foreign trade reacts to changes in monetary
and fiscal policies. Thus, this paper constructed a time varying parameter model
for Nigeria’s trade- growth nexus. The time varying models allow the parameters
to change gradually over time, which is the main difference between time-varying
models and Markov switching models. It presented how the residuals in the model
can be used to detect the time of structural breaks and outliers. The rest of the
paper is structured as follows: in Sect. 2, the state space model and the Kalman
filter equations are introduced. Section 3 presents the model specification, and
the results of the empirical application are presented in Sect. 4. The detection
of outliers and structural breaks are presented in Sect. 5, and the final section
concludes the paper.

2 State Space Model and the Kalman Filter

The state space model consists of two equations: the state equation (also called
transition or system equation) and the observation equation (also called measure-
ment equation). The measurement equation relates the observed variables (data)
and the unobserved state variables, while the transition equation describes the
dynamics of the state variables. The state-space representation of the dynamics
of y is given by the following systems of equations:

yt = AXt + Hβt + t (1)


βt = F βt−1 + νt (2)
52 O. J. Asemota

The vector t and νt are white noise, such that

E(t τ ) = R, f or t = τ, and 0 otherwise (3)

E(νt ντ ) = R, f or t = τ, and 0 otherwise (4)


And the disturbances are assumed to be uncorrelated at all lags:

E(t ντ ) = 0, f or all t and τ (5)


where βt is k ×1 vector of unobserved state variables, H is n x k matrix that links
the observed vector yt to the unobserved βt , Xt is an r x 1 vector of exogenous
or predetermined observed variables, R and Q are (n x n) and (k x k) matrices
respectively.
The Kalman filter algorithm is used in estimating the unobserved state vec-
tor βt and provides a minimum mean squared error estimate of βt given the
information set. The Kalman filter algorithm is further divided into Kalman fil-
tering and Kalman smoothing. While the Kalman filter gives estimate of βt on
the basis of observed through date t, the smoothing provides estimate of based
on all the available data in the sample through date T.
The Kalman filter recursive algorithm consists of two stages- prediction and
updating stages. In the first stage, an optimal predictor of yt based on all avail-
able information up to t−1 (ŷ t| t−1 ) is obtained. To achieve this, β̂ t| t−1 is cal-
culated. On observing yt , the prediction error can be calculated as η t| t−1 = yt
- ŷ t| t−1 . Thus, making use of the information in the prediction error, a more
accurate estimate of βt (β̂ t| t ) can be obtained. The celebrated Kalman filter
equations are given by following equations:

β̂t| t−1 = F βt−1| t−1 (6)

Pt| t−1 = F Pt−1| t−1 F  + Q (7)

ηt| t−1 = yt − ŷt−1| t−1 = AXt + Hβt − AXt − H β̂t| t−1 = H(βt − β̂t| t−1 ) + t (8)

ft| t−1 = HPt| t−1 H  + R (9)

β̂t| t = β̂t| t−1 + Kt ηt| t−1 (10)

Pt| t = Pt| t−1 − Pt| t−1 H(HPt| t−1 H  + R)−1 H  Pt| t−1 (11)
 −1
where Kt = F Pt| t−1 (HPt| t−1 H + R) is the Kalman gain matrix, which is the
weight assigned to new information about βt contained in the prediction error1 .
1
This section draws largely from Asemota and Saeki (2012).
Kalman Filter and Structural Change Revisited 53

For a detailed derivation of the Kalman filter equations, see Hamilton (1994),
and Durbin and Koopman (2001). In Eq. (7), the uncertainty underlying β̂t| t−1 is
a function of uncertainty underlying β̂t−1| t−1 and Q, the covariance matrix of the
shocks to βt . Eq. (8) decomposed the prediction error into two components, the
first component H(βt − β̂t| t−1 ), might be called “parameter uncertainty”, which
reflects the fact that β̂t| t−1 will differ from the true and the second component
is the error due to t , a random shock to yt . Thus in Eq. (9), the mean squared
error matrix of the prediction error is a function of the uncertainty associated
with β̂t| t−1 and of R, the variance of t . The updating equation in (10) indicates
that β̂t| t is formed as a sum of β̂t| t−1 and a weighted average of the prediction
error, the weight assigned to the prediction error is the Kalman gain, Kt .
In some applications, inference about the value of βt is based on all the sample
data. This inference is known as the smoothed estimate of βt . The smoothed
estimates provide more accurate inference on βt . The following two equations
can be iterated backwards for t = T − 1, T − 2, ..., 1, to obtain the smoothed
estimates:
−1
β̂t| T = β̂t| t + Pt| t F  Pt+1| t (β̂t+1| T − F β̂t| t ) (12)


−1 −1
Pt| T = Pt| t + Pt| t F  Pt+1| t (Pt+1| T − Pt+1| t )Pt+1| t F Pt| t (13)

Thus, the sequence of smoothed estimates {β̂t| T } Tt=1 is calculated as follows.


First, the Kalman filter estimates are calculated and stored. The smoothed esti-
mates for the final date in the sample (β̂T | T and PT | T ) which are the initial
values for the smoothing, are just the last iteration of the (β̂t| t and Pt| t ). Next,
Eqs. (12) and (13) are used moving through the sample backward starting with
t = T − 1, T − 2, ..., 1.
Durbin and Koopman (2001) showed that the Kalman filter is a useful device
for recursively solving the state-space model and argues that the flexibility of
the state-space model would be particularly important for applications in which
issues such as missing observations, structural breaks, outliers, and forecasting
are paramount.

3 Model Specification

Oluchukwu and Ogochukwu (2015) investigated the nexus between economic


growth and foreign trade using the double log linear regression model estimated
using the ordinary least squares. Their model is presented in Eq. (14).

lnRGDP = ψ0 + ψ1 lnEX + ψ2 lnIM P + ψ3 lnEXR + ψ4 lnF DI + ωt (14)

where RGDP denotes Real Gross Domestic Product, EX is the volume of export,
IMP is the volume of import, EXR is the exchange rate and FDI is foreign direct
investment. However, model (14) assumed constancy of parameter which cannot
54 O. J. Asemota

be guaranteed, and the model does not allow the parameters to vary across time.
Thus, the specification might be inefficient because parameters are expected to
react to changes in policy and other shocks to the economy. An alternative
flexible model is the time varying coefficient (TVC) model formulated in the
state space form. The state space formulation of model (14) is:

lnRGDP = ψ0t + ψ1t lnEX + ψ2t lnIM P + ψ3t lnEXR + ψ4t lnF DI + t (15)

ψit = φi ψit−1 + ωit , where i = 1, 2, 3 and 4 (16)


Equation (15) is the system or measurement equation, while Eq. (16) is the
transition or state equation which is used to simulate how the parameters evolve
over time. In most empirical studies, the specification used for the transition
equation is the random walk process (RWP). The RWP has been proved to have
the capability to capture structural change in econometric models, see Song and
Witt (2000) and Shen et al. (2009).

ψit = ψit−1 + ωit (17)


⎡ ⎤
ψ1t
⎢ψ2t ⎥
InRGDP − ψ1t = [InEX, InIM, InEXR, InF DI] ⎢ ⎥
⎣ψ3t ⎦ + t (18)
ψ4t
⎡ ⎤ ⎛ ⎞⎡ ⎤ ⎡ ⎤
ψ1t 1000 ψ1,t−1 ηt
⎢ψ2t ⎥ ⎜0 1 0 0⎟ ⎢ψ2,t−2 ⎥ ⎢ ζt ⎥
⎢ ⎥=⎜ ⎟⎢ ⎥ ⎢ ⎥
⎣ψ3t ⎦ ⎝0 0 1 0⎠ ⎣ψ3,t−3 ⎦ + ⎣ ξt ⎦ (19)
ψ4t 0001 ψ4,t−4 πt
The first equation is the observation equation with time varying parameters
and the other equations are the transition equations. The transition equations
describe the dynamics of the parameter which is assumed to follow a random
walk process. These state space models can be estimated by applying the Kalman
filter earlier described. In these models, the coefficients can change over time,
which allows the parameters to respond differently under policy changes. The
unknown variance parameters in models (18) and (19) are estimated by the max-
imum likelihood estimation via the Kalman filter prediction error decomposition
initialized with the exact initial Kalman filter.

4 Model Estimation and Results


Annual data from 1981 through 2016 obtained from World Development Indi-
cator of the World Bank are used in the analysis. A descriptive statistics for the
data is given in Table 1 and the graphical display of the series is presented in
Fig. 1.2
2
The RATS (version 8.3) ecoometric software is used for all our estimations.
Kalman Filter and Structural Change Revisited 55

Table 1. Data and Descriptive Statistics (Source: Author’s computation)

Macro variable No. of Mean Standard Minimum Maximum


Observation deviation
RGDP 36 24.9647 1.1319 23.4826 27.0663
Export 36 23.6704 1.1052 21.7382 25.6994
Import 36 23.3064 1.1539 21.4911 25.2049
Exchange rate 36 3.2938 1.9477 -0.4943 5.5353
FDI 36 21.1956 1.0996 19.0581 22.9027

Fig. 1. Time series plot of the macroeconomic variables.


56 O. J. Asemota

Fig. 2. Rolling regression parameter estimates for trade-growth Nexus.

To justify the use of time varying parameter model, a total of 16 rolling


regressions was first performed. The graphical display of the rolling parameters
is presented in Fig. 2. Parameter instability over time is clearly evident in Fig. 2,
hence the need to consider time varying parameter class of models in modelling
the trade-growth nexus. The export and import coefficients estimates (ψ̂1 ) and
(ψ̂1 ) respectively takes on the theoretical expected positive values throughout
the sample estimated period. The regression coefficient for FDI (ψ̂4 ) started
with negative values but later take on positive values between 2009 and 2012, it
however drifted to negative values from 2012. The coefficient of exchange rate
(ψ̂3 ) take on the theoretical expected negative values throughout the sample
period, however, evidence of parameter instability is noticed in the movement
over the over the sample period.
Thus, the plots of the rolling regression reveals inherent parameter instability
that calls for estimation of the model using a more sophisticated technique that
allows the estimated parameter to be time dependent. Thereafter, the observa-
tion and transitions equations in 18 and 19 are estimated using the Kalman filter
algorithm. Table 2 presents the estimate of the model’s parameters along with the
p-values. The maximum likelihood estimates of the hyper parameters obtained
employing the BFGS optimization technique are presented in Table 2. These
Kalman Filter and Structural Change Revisited 57

hyper parameters are employed in the Kalman filter algorithm to generate the
Kalman filtered and smoothed transition of the coefficients of the trade-growth
model. A graphical display of the Kalman filter and smoother estimates of the
coefficients of trade-growth model are presented in Figs. 3 and 4 respectively.

Table 2. Maximum likelihood estimates

Parameters Estimates P-value


ψ̂0t 13.4497 0.0000
ˆt 1.0796 × 10−7
η̂t 3.8785 × 10−7 0.0513
ζ̂t 1.5002 × 10−8 0.0040
ξˆt 1.0506 × 10−6 0.0024
π̂t 1.4085 × 10−7 0.0513
Log likelihood 10.3001
Note: t is the estimate of the measure-
ment equation variance, ηt , ζt , ξt , πt are the
estimates of the variances of the transition
equation.

From Fig. 3, the Kalman filtered estimate of the export’s parameter (ψ̂1 ) gen-
erally drifted upward, and it takes on the theoretical expected sign throughout
the sample period. It trended upwards and declined around 1993, 1996–1998 and
2008/2009. The Kalman filtered import coefficient was equally positive through-
out the estimation period and it transition depicts a declining impact of import
on GDP growth over time. It attains a peak value of about 0.42 in 1984 and there-
after exhibits a declining pattern until it attained its minimum values between
2012 and 2013. The Kalman smoothed estimates of the export (ψ̂1 ) and import
(ψ̂2 ) parameters behave according to their theoretical expectation, and generally
exhibit upward movement, fluctuating in an increasing and decreasing pattern.

5 Detection of Outliers and Structural Breaks


One of the attractions of the state space model is that the auxiliary residuals3 of
the model are useful in the detection of outliers and structural breaks. Harvey
and Koopman (1992) proved that the auxiliary residuals though are autocorre-
lated even for a correctly specified model but tests constructed based on auxiliary
residuals are reasonably effective in detecting and distinguishing between outliers
and structural change. Consequently, the paper adopted the procedure discussed
in Harvey and Koopman (1992) and Durbin and Koopman (2001). In the state
3
Auxiliary residuals are the residuals associated with the state or transition equation.
These auxiliary residuals are estimators of the disturbances associated with the
unobserved components.
58 O. J. Asemota

Fig. 3. Kalman filtered time varying parameters of the trade-growth model.

space framework, outliers are indicated by large (positive or negative) values


associated with the observation equation disturbance (t ) and structural breaks
are detected by identifying large absolute values in the disturbances associated
with transition equations (Durbin and Koopman 2001).
The calculation of the auxiliary residuals is carried out by putting the model
in state space form and applying the Kalman filter and smoother. For an illus-
tration, let t be the disturbance associated with the observation equation and
ηt be the disturbance associated with one of the unobserved components; the
standardized smoothed residuals are given by;
ˆt
ρ=  t = 1, 2, ..., T (20)
var(ˆ
t )
η̂t
ρ=  t = 1, 2, ..., T (21)
var(η̂t )
The basic detection procedure is to plot the auxiliary residuals after they had
been standardized. The standardization is necessary because the residuals at the
beginning and end of the sample period will tend to have higher variances. In
Kalman Filter and Structural Change Revisited 59

Fig. 4. Plots of standardized auxiliary residuals for trade-growth model.

a Gaussian model, indications of outliers and structural break arise for values
greater than 2 in absolute value, see Durbin and Koopman (2001) and Harvey
and Koopman (1992) for detailed explanation on this procedure. Applying the
above procedure, the plots of the standardized auxiliary residuals of our model
are presented in Fig. 4.
From Fig. 4, an outlier is detected in 1993, and the paper found strong evi-
dence of structural breaks in parameters of the model in 2009. Weak evidence of
structural change is also detected around 1994. The outlier detected in 1993 and
structural break detected in 1994 could be attributed to the drastic change in
policy as a result of the political turmoil in Nigeria with the arrival of the mili-
tary administrator led by late General Sani Abacha. The military administrator
re-regulated the economy by capping exchange and interest rate, and the man-
agement of oil revenue and the macro economy was haphazard, translating into
macroeconomy instability (Asemota and Saeki, 2010). In addition, the structural
break detected in 2009 could be attributed to the global financial crises which
occurred in 2008–2009. Due to the interdependence of global economy, the global
financial crises that started in 2007 affected global economies including Nigeria.
The effect of the crises on the Nigerian economy was not felt until mid-2008, and
the capital market was greatly affected as foreign investors withdrew and repa-
60 O. J. Asemota

triated their funds to their home countries. Similarly, the economic downturn in
the United States (a major export destination for Nigeria’s crude oil) affected
the demand and price of oil. Furthermore, the official exchange rate depreciated
by 25.6% between 2008 and 2009, reflecting the demand pressure relative to
supply with implications for foreign reserve. More so, inflation rose from 6% in
2007 to 15.1% in 2008 and remained at double digits till January 2013 when it
returned to single digit.4
Our estimated outliers and break dates are quite significant because they cor-
respond to some important political events in Nigeria and the global financial
crises that occurred between 2007–2009. It is important to note that while we
have attempted to explain the causes of the structural breaks, further research
on Nigeria Trade-Growth Nexus might be worthwhile. For instance, disaggrega-
tion of imports and exports may give a clearer explanation and capture more
structural changes in the trade-growth model.

5.1 Model Diagnostics Results

Diagnostic tests are used to check the adequacy of the fitted model. The diagnos-
tics tests considered are the heteroscedascity test H(h), serial correlation (Q) test
and the normality (N) test. The results of the model diagnostics are presented
in Table 3.

Table 3. The Model’s diagnostic tests

Tests Statistic P-value


Q(9-4) 46.61 0.0000
Normality 0.97 0.6144
H(12) 1.24 0.7192

The diagnostic tests for the fitted model are quite satisfactory with the excep-
tion of serial correlation test. The results indicate that the residuals are highly
serially correlated with the Box-Ljung statistic based on the first 9 sample auto-
correlation, Q(15) = 46.61 with a p-value of 0.0000. Harvey and Koopman (1992)
demonstrated that the auxiliary residuals in a state space model are usually auto-
correlated even for a correctly specified model. The closeness of the plot to the
45 degree line suggests that the residuals are normally distributed.5

4
The Impact of the Global Financial Crisis on Nigerian Economy. November 28, 213.
By KA Project Research. Available at www.proshareng.com.
5
The plot is a graphical display of ordered residuals against their theoretical quantiles.
The 45 degree line is taken as a reference line (Durbin and Koopman, 2001).
Kalman Filter and Structural Change Revisited 61

Fig. 5. Diagnostic plots of the standardized residuals.

6 Conclusions

The importance of trade to the national economy cannot be overemphasized.


According to the National Bureau of Statistics (NBS, 2017), trade’s contribu-
tion to Nigeria’s GDP was 16.90% in 2017 and contributed 17.1% to the economy
at the end of first quarter of 2018. The country’s economy has been affected by
several shocks such as the 1985-86 crash in oil price, the 1997 Asian Financial
crisis and the global financial crisis of 2008–2009 (Abiola and Asemota 2017).
The domestic economy is equally affected by political uncertainties in the coun-
try. Thus, for proper estimation of the trade-growth nexus, it is imperative to
take these changes into consideration. Consequently, this paper was designed to
investigate structural changes in trade-growth nexus from 1981 to 2016 using
a time-varying parameter model. First, the paper applied the rolling regression
estimation strategy to the parameters of the trade-growth model to justify the
application of the TVP model. The graphical display of the rolling estimate of
the parameters confirmed that the coefficients were indeed unstable over time.
Thus, to estimate the parameters of the model over time and detect the timing
of the structural breaks and outliers, a time varying parameter model in state
space form was employed. The Kalman filter and smoother are very effective
methods for estimating the transition of the parameters over time. The time
varying parameters were estimated as unobserved components in the state vec-
tor of the state space model. The paper demonstrated that the plots of the
auxiliary residuals of the state space model are very useful in detecting struc-
tural changes in time series. The analytical results reveal that the parameters of
Nigeria’s trade-growth nexus were severely affected by the political turmoil in
Nigeria in 1993/1994 and the global financial crises of 2008/2009. The proximity
of structural break dates detected in the study to period of important economic
and political events, such as the political turmoil of 1993 and global financial
crises attested to the significance of the break dates.
62 O. J. Asemota

References
Abiola, G.A., Asemota, O.J.: Recent economic recession in nigeria: causes and solutions.
A Paper Presented at the 2017 Institute of Public Administrator of Nigeria, Nicon
Luxury Hotel, Abuja (2017)
Anowor, O.F., Agbarakwe, H.U.: Foreign trade and Nigerian economy. Dev. Ctry. Stud.
5(6), 77–82 (2015)
Arodoye, N.L., Iyoha, M.A.: Foreign trade-economic growth nexus: evidence from Nige-
ria. CBN J. Appl. Stat. 5(1), 121–140 (2014)
Asemota, O.J.: State space versus SARIMA modeling of the Nigeria’s crude oil export.
Sri Lankan J. Appl. Stat. 17(2), 87–108 (2016). https://doi.org/10.4038/sljastats.
v17i2.7872
Asemota, O.J.: Modeling inflow of tourists to Japan: an evaluation of forecasting per-
formance of time varying parameter model. J. Jpn. Assoc. Appl. Econ. 6, 34–60
(2012). Studies in Applied Economics
Asemota, O.J., Chikayoshi, S.: Structural change in macroeconomic time series: a sur-
vey with empirical applications to ECOWAS GDP. J. Kyushu Econ. Sci. Jpn. 48,
25–35 (2010). The annual Report of Economic Science
Durbin, J., Koopman, S.: Time Series Analysis by State Space Methods. Oxford Uni-
versity Press, New York (2001)
Ekpo, A.H., Egwaikhide, F.O.: Export and economic growth in Nigeria: a reconsider-
ation of the evidence. J. Econ. Manag. 1(1), 100–115 (1994)
Hamilton, J.D.: A new approach to the economic analysis of nonstationary time series
and the business cycle. Econometrica 57, 357–384 (1989)
Hansen, B.E.: The new econometrics of structural change: dating breaks in U.S Labor
productivity. J. Econ. Perspect. 15(4), 117–128 (2001)
Harvey, A.C., Koopman, S.J.: Diagnostic checking of unobserved components time
series models. J. Bus. Econ. Stat. 10(4), 377–389 (1992)
Hauwe, S., Paap, R., Van Dijk, D.: An alternative bayesian approach to structural
breaks in time series models. Tinbergen Institute Discussion Paper (2011)
Ito, M., Noda, A., Wada, T.: An alternative estimation method of a time-varying
parameter model. Working Paper, Faculty of Economics, Keio University, Japan
(2017)
Iyoha, M.A.: An econometric analysis of the impact of trade on economic growth in
ECOWAS Countries. Niger. Econ. Financ. Rev. 3(1), 25–42 (1998)
Omoke, P.C., Ugwuanyi, C.U.: Export, domestic demand and economic growth in
Nigeria: granger causality analysis. Eur. J. Soc. Sci. 13(2), 211–218 (2010)
Perron, P.: The great crash, the oil price shock, and the unit root hypothesis. Econo-
metrica 57, 1361–1401 (1989)
Shen, S., Li, G., Song, H.: Is the time-varying paramter model the preffered approach
to tourism demand forecasting? Statistical evidence. In: Matias, A., et al. (eds.)
Advances in Tourism Economics, pp. 107–120. Physica-Verlag, Heidelberg (2009)
Song, H., Witt, S.F.: Tourism Demand Modelling and Forecasting: Modern Economet-
ric Approaches. Pergamon, Oxford (2000)
Sunanda, S.: International trade theory and policy: a review of the literature (2010).
http://www.levyinstitute.org. Accessed 23th June 2018
Theodosiadou, O., Skaperas, S., Tsaklidis, G.: Change point detection and estimation
of the two-sided jumps of asset returns using a modified Kalman Filter. Risks 5, 15
(2017). https://doi.org/10.3390/risks5010015
Xu, J., Perron, P.: Forecasting in the presence of in and out of sample breaks. Working
Paper, Shanghai University of Finance and Economics (2017)
Statisticians Should Not Tell Scientists
What to Think

Donald Bamber(B)

Cognitive Sciences Department, University of California–Irvine,


Irvine, CA 92697, USA
dbamber@uci.edu

Abstract. Some prominent schools of thought regarding the use of


probability and statistics in science are reviewed. Different schools have
different goals for statistics; that is not inappropriate. Mimetic modeling,
whose goal is to mimic Nature’s behavior, is described and advocated.
Both Bayesian analyses and some classical analyses may be appropri-
ately applied to mimetic models. Statistics should not usurp scientific
judgment and, in mimetic modeling, it does not.

Keywords: Interpretations of probability · Mimetic modeling


Frequentist statistics · Bayesian statistics

1 Themes in the Application of Statistics to Science


This paper is concerned with the application of probability and statistics to
the scientific modeling of natural phenomena. The main themes that will be
discussed are the following.
• Probability is anything that satisfies Kolmogorov’s axioms. Different inter-
pretations of probability exist.
• Different schools of thought in statistics often employ different interpretations
of probability. One view is that probability distributions describe uncertainty;
another view is that probability distributions describe variability in Nature.
• Probabilities don’t know how they’re being interpreted. The inventor of a
model may give the probabilities in the model one interpretation. Someone
else may give those probabilities a different interpretation. This, of course,
changes the interpretation of the model.
• Different schools of thought in statistics have different goals for their statisti-
cal analyses; it is not inappropriate for different statisticians to have different
philosophies and, therefore, different goals.
• To select an appropriate statistical approach, a scientist must decide what
his/her goals are.
• The incorrect belief that all statisticians either have the same goals or ought
to have the same goals is a cause of pointless controversy in statistics.
• Some statistics textbooks are philosophically vague; they describe various
statistical analyses but do not explain how those analyses are justified by the
concept of probability presented in the book.
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 63–82, 2019.
https://doi.org/10.1007/978-3-030-04263-9_5
64 D. Bamber

• Bayesian statistical analyses can be justified by different schools of thought


(i.e., epistemic modeling and mimetic modeling). Depending on the school,
the exact same Bayesian analysis may be aimed at achieving dramatically
different goals.
• Although Bayesian and classical/frequentist analyses are often portrayed as
being incompatible, there is a statistical school of thought (i.e., mimetic mod-
eling) that justifies both Bayesian analyses and some classical/frequentist
analyses.
• Some statistical schools of thought (both those employing Bayesian analyses
and those employing classical/frequentist analyses) aim to tell scientists what
their data forces them to think. In my opinion, this is wrong. Statistics should
not usurp scientific judgment and, in mimetic modeling, it does not.

2 Schools of Thought in Probability and Statistics


Different scholars often have different concepts of what probability is. For a
discussion of these different points of view, see the book by Gillies [15], which
I reviewed in [4], and the book by Diaconis and Skyrms [12]. The discussion in
the former book is broader; in the latter book, more incisive.
In my view, there is no point in arguing over which of these concepts of
probability is better or correct [4]. Anything, that satisfies Kolmogorov’s axioms
for probability [16], no matter whether it is degrees of belief, or ratios of counts of
possibilities, etc., should be regarded as legitimate probability. Different concepts
of probability may be useful for different purposes.
In much of statistics, probability is used to describe uncertainty. For exam-
ple, in his textbook All of Statistics, Wasserman [22, p. 3] states: Probability
is a mathematical language for quantifying uncertainty. In contrast, in much of
science, probability is used as a description of variability in Nature.
Unfortunately, in much scientific work and in many statistics textbooks,
although probability is employed as a central concept, that concept is only
vaguely explained and is left unclear.
Stamp Collecting vs. Coin Collecting. There simply isn’t any conclusive argu-
ment showing that one concept of probability is correct and the others wrong.
To argue over this issue is as pointless as arguing over whether stamp collecting
is better than coin collecting. The stamp collector may argue that stamp collect-
ing is better than coin collecting because stamps are more colorful than coins.
And the stamp collector may think that the coin collector is wasting time and
resources on a foolish occupation. But the coin collector is doing what he/she
prefers and cannot be induced to change that preference by logical argument.
Similarly, it must be recognized that different statisticians employ different
kinds of probability because they have different kinds of goals. Statistician A
may fail to realize that his/her goals are different from Statistician B’s goals. As
a result, Statistician A may say to Statistician B: Your statistical methods are
no good. In reality, Statistician A would have been more accurate if he/she had
said to Statistician B: Although your statistical methods may be appropriate for
achieving your goals, they are not appropriate for achieving my goals.
Statisticians Should Not Tell Scientists What to Think 65

An example of a failure to appreciate that different statisticians have different


goals for their analyses is provided by a statement of Dennis Lindley’s [17]:
Every statistician would be a Bayesian if he took the trouble to read the literature
thoroughly and was honest enough to admit that he might have been wrong. This
statement might be correct if every statistician had the same goals as Lindley.
But they don’t all have the same goals.

3 Probability as Degree of Belief: Dutch Book Arguments


It is often said that probability quantifies uncertainty. But, what is meant by
uncertainty? One answer has been that, when we talk about uncertainty, we
are talking about degree of belief in propositions/events. Furthermore, degree of
belief in an event (such as “it will rain tomorrow”) varies from person to person.
Thus, when we talk about degree of belief, we are talking about some person’s
degree of belief.
A person’s degree of belief in various events is often defined in terms of those
bets about those events that the person is willing to make and those bets that
the person is not willing to make. Philosophers ad probabilists have formulated a
number of arguments having a basic similarity to each other showing that, when
defined in terms of bets, a person’s degrees of belief should conform to the laws
of probability. (For historical reasons that need not concern us, these arguments
are called Dutch book arguments.)
In addition to Dutch book arguments, there is another approach to justifying
personal probability that was developed by Leonard J. Savage [20]. I will not
discuss Savage’s work here as it more complicated than Dutch book arguments.
One form of the Dutch book argument is presented here. Other forms are
discussed in [15, Chap. 4] and in [12, Chap. 2]. In fact, over a period of gener-
ations, philosophers, probabilists, and statisticians have developed a collection
of intellectually very impressive justifications for personalist probability—again
see [12, Chap. 2].
Dutch book arguments tend to be confusing because one must keep track of
multiple gambles, together with all possible outcomes of those gambles, and the
net gain or loss from the multiple gambles. Furthermore, having to keep track of
all the details in the argument makes it hard to discern the intuitive idea behind
the argument. To avoid these problems, I formulated the kind of Dutch book
argument presented here that is designed to be less confusing.

3.1 The House and Your Contract with It

The form of the Dutch book argument presented here asks and answers the
question of what your strategy should be if you signed a contract with a gam-
bling establishment that required that your gambling be conducted according to
certain rules (described below) that are specified in the contract.
The House. (Assume that a gambling establishment that we will call the House
has selected as its unit of currency some arbitrary amount, say 100 Thai baht.
66 D. Bamber

Suppose that the House issues conditional promissory notes of two kinds: unit
and fractional. Consider an example of a unit note. If A is some event that hasn’t
happened yet, such as rain next week or no rain next week, then the promissory
note PN(A) promises that the House will pay the holder one unit of currency
if the event A occurs and nothing otherwise. Unit notes may be divided into
fractional parts. Suppose that 0 ≤ α ≤ 1. Then, the α-fractional part of PN(A)
is denoted α PN(A). This fractional note pays off α currency units if A occurs
and nothing otherwise.
Your Contract with the House: Divisibility of Notes. If you are the holder of a
promissory note PN1 , the House can require you to trade it for the two notes
α PN1 and (1 − α) PN1 , where 0 ≤ α ≤ 1. Conversely, if you hold the two notes
α PN1 and (1 − α) PN1 , the House can require to trade them for the note PN1 .
Your Contract with the House: Start of Gambling Session. At the start of the
gambling session, you are required to pay one currency unit to purchase from
the House the promissory note PN(Ω), where Ω is the universal or sure event.
The note PN(Ω) is just as good as one unit of currency, because it is a promise
by the House to pay the note holder one unit of currency, no matter what.
Your Contract with the House: Trading. If you hold the promissory note PN1 ,
the House may require you to trade it for another promissory note PN2 provided
that you have agreed beforehand that the trade is fair. (How the House finds
out from you which trades you regard as fair is explained below.)
Your Contract with the House: Fractional Trades. Suppose that you have told
the House that, if demanded by the House, you assent to trade the note PN1
for the note PN2 or vice versa, then the contract requires that you assent to
any fraction of that trade. So suppose that β is between zero and one. Having
agreed to trade the note PN1 for the note PN2 or vice versa, you also agree to
trade the note β PN1 for the note β PN2 or vice versa.
Your Contract with the House: End of Gambling Session. After executing any
trades with you that the House wishes (but only trades that you regard as fair),
the House declares the gambling session to be over and pays off any promissory
notes that you hold.
Your Contract with the House: Elicitation of Your Judgments of Fairness. It was
mentioned above that the House finds out from you what trades you consider
“fair”. This is done as follows. Given two events A and B, the House may require
you to specify a fair “exchange rate” between the promissory notes PN(A ∩ B)
and PN(A). In other words: the House requires you to specify some number α
such that you agree to trade PN(A ∩ B) for α PN(A) and, conversely, you agree
to do the trade in the reverse direction. The value of α chosen by you is denoted
exch(B|A). Thus, you agree to trade PN(A ∩ B) for exch(B|A) PN(A) and vice
versa. Abbreviation: The exchange rate exch(B|Ω) will be abbreviated exch(B).
Note that these trades are a kind of bet. If you agree that it is fair to trade
exch(B)PN(Ω) for PN(B) and vice versa, then (in essence) you are saying
willing to pay the House exch(B) currency units for a promise from the House
to pay you one currency unit if the event B occurs. Conversely, you are willing
(in essence) to take a sure gain of exch(B) currency units in return for giving up
Statisticians Should Not Tell Scientists What to Think 67

the risky opportunity to gain one currency unit if B occurs. Whichever direction
the trade is run, you are gambling.
An Aside: Exchange Rates as Operational Definitions of Degrees of Belief. We
may regard your chosen exchange rates as operationally defining your degrees of
belief. Thus, the stronger your belief in the event B, the higher should be your
exchange rate exch(B) relative to exch(Ω) = 1. Analogously, the stronger your
belief in the event A ∩ B relative to your belief in the event A, the higher should
be your exchange rate exch(B|A).

3.2 Have You Been Rational?


Rationality of Your Judgments of “Fairness” of Trades. Have you been rational
in your judgment of which trades of promissory notes are fair? We will say that
your judgments of fairness have been irrational if there is some sequence of trades,
judged fair by you, that converts your starting stake of PN(Ω) into α PN(Ω),
were 0 ≤ α < 1. If there is such a sequence of trades and if that sequence
were executed, then you would suffer a sure loss. You would have started with
a promissory note that was the equivalent of one unit of currency and ended up
with a promissory note that was the equivalent of α < 1 units of currency. (If
the House has executed a sequence of trades with you that caused you a sure
loss, we will say that the House made a Dutch book against you.)
The essence of Dutch book arguments is that, if you do not wisely choose the
exchange rates exch(B|A) across events A and B, then the House can require
you to make trades where you, in effect, “buy high and sell low”, resulting in sure
loss for you and a sure gain for the House. In other words, if you do not wisely
choose your exchange rates, you can come out on the losing end of arbitrage.
In particular, Dutch book arguments show that, if your chosen “fair”
exchange rates do not obey the laws of probability, then the House can exe-
cute a sequence of trades (all of which you regard as fair) that will cause you to
suffer a sure loss. In particular, you must choose exchange rates such that

exch(Ω) = 1 (1)

and, for all events A,


exch(A) ≥ 0. (2)
In addition, for any disjoint events A and C, you must choose exchange rates
such that
exch(A ∪ C) = exch(A) + exch(C). (3)

3.3 Dutch Book Treatment of Conditional Probability


Dutch book arguments also show that, given any two events A and B, if you
have chosen exch(A) > 0, then you must choose exch(A ∩ B) and exch(B|A)
such that
exch(A ∩ B)
exch(B|A) = , when exch(A) > 0. (4)
exch(A)
68 D. Bamber

If your chosen exchange rates do not conform to Eq. 4, then House can require
you to make a sequence of trades (all considered fair by you) that result in a
sure loss for you.
Proof of Eq. 4. We will show that the exchange rates chosen by you must satisfy
Eq. 4 if you do not wish to be vulnerable to a Dutch book. Suppose that you
chose exch(A) > 0, but your chosen value for exch(B|A) does not satisfy Eq. 4.
There are two cases to consider: Either

exch(A ∩ B) > exch(B|A) exch(A), (5)

or the reverse inequality. Consider the case where the inequality (5) holds; the
case of the reverse inequality being analogous. Recall that you start the gam-
bling session holding the promissory note PN(Ω). Then, the House may, at its
discretion, execute the following sequence of trades with you (all of which you
consider fair).

Trade 1: The House takes from you PN(Ω) and gives you the two notes:

exch(A ∩ B)PN(Ω) and [1 − exch(A ∩ B)]PN(Ω).

Trade 2: The House takes from you exch(A∩B)PN(Ω) and gives you PN(A∩B).
Trade 3: The House takes from you PN(A ∩ B) and gives you exch(B|A)PN(A).
Trade 4: Next comes a fractional trade. The House takes from you
exch(B|A)PN(A) and gives you

exch(B|A)[exch(A)PN(Ω)].

Trade 5: The House takes from you the two notes

[1 − exch(A ∩ B)]PN(Ω) and exch(B|A)[exch(A)PN(Ω)]

and gives you the single note

α PN(Ω), where α = 1 − exch(A ∩ B) + exch(B|A)exch(A).

But Eq. 5 implies that α < 1. Thus, the House has executed a sequence of trades,
regarded as fair by you, that has resulted in a sure loss for you.
Exchange Rates Must Conform to the Laws of Probability. Thus, Dutch book
arguments have shown that, if your exchange rates are rationally chosen (in the
sense of not allowing the House to trade you into a sure loss), then your exchange
rates must obey Eqs. 1, 2, 3, and 4. In other words, they must conform to the
laws of probability.
Rationality of Belief. As mentioned in an aside earlier, your chosen exchange
rates may be taken as an operational definition of your degrees of belief. Thus,
what Dutch book arguments show is that, for your degrees of belief to be rational,
they should conform to the laws of probability.
Statisticians Should Not Tell Scientists What to Think 69

3.4 The Dutch Book Treatment of Belief Updating

The Dutch book arguments discussed above show that, at any given moment of
time, your beliefs must conform to the laws of probability if you are rational.
Because those Dutch book arguments apply only to a moment in time, they are
said to be synchronic.
Beliefs change over time as we acquire new information. Is there a theory of
rational belief change? Yes, there is. So-called diachronic Dutch book arguments
assert that, when you acquire new information, the only way to rationally update
your beliefs in the light of the new information, is to condition on the new
information. (Just what is meant by conditioning will be explained soon.)
Diachronic Dutch book arguments are more complicated than synchronic
ones and, therefore, will not be explained in full detail here. The interested
reader may find a full explanation in [12, Chap. 2]. The general idea is this:
You are back gambling with the House again. At a moment in time where you
don’t know whether the event A has occurred or not, the House demands that
you state what your new exchange rate between the promissory notes PN(B)
and PN(Ω) would be if you were to learn that A had occurred. Let us denote
that new exchange rate by exchA (B). A diachronic Dutch book argument shows
that, if exch(A) > 0, you must choose exchA (B) to be equal to exch(B|A). If
you choose any other value for exchA (B), then the House can demand that you
make a sequence of trades (i.e., bets) with it that will result in a sure loss for
you.
Change of Notation. Since your degrees of belief are operationally defined by your
chosen exchange rates and since the latter must conform, if you are rational, to
the laws of probability, the notations exch(B), exch(B|A), and exchA (B) for
exchange rates will be replaced by the probability notations Pr(B), Pr(B|A),
and PrA (B), which will be used to represent your degrees of belief.

3.5 Epistemic Bayesianism

By epistemic Bayesianism, I mean the philosophical school of thought based on


the two assertions:

• Rational degrees of belief conform to the laws of probability.


• When new information is acquired, the only rational way to update one’s
personal probabilities (i.e., personal beliefs) is to condition those probabilities
on the new information.

Terminology. One’s old beliefs are often called prior beliefs and one’s new,
updated beliefs are often called posterior beliefs.
Putting matters loosely:

• Rationality of belief is a kind of consistency of belief. Conformance to the


laws of probability enforces that consistency.
• When new information forces you to change your beliefs, you should keep your
new beliefs close to your old beliefs. You achieve that closeness by conditioning
70 D. Bamber

your old beliefs on the new information and, in this way, arriving at your new
beliefs.

Remark. Epistemic Bayesianism can tell you whether your beliefs are rational.
But having rational beliefs is no guarantee of having accurate beliefs.
Epistemic Bayesianism as a Vision of Rational Science. A high-level and par-
tial description of the activities of scientists is that (a) they start with opin-
ions/beliefs about Nature, (b) they collect data, and (c) they revise their opin-
ions in light of the new data that they collected. Given this characterization
of science, the epistemic Bayesian prescription for rational science, is that the
scientists’ initial opinions should conform to the laws of probability and that
they should revise their opinions by conditioning their initial opinions on their
collected data.

3.5.1 A Good Recipe for Science?


Is this epistemic Bayesian prescription a good recipe for doing science? I don’t
think it is. Scientists (and people in general) need to change their beliefs in ways
that epistemic Bayesianism deems irrational.
Changing Your Mind Without New Evidence is Irrational. For example, suppose
that you and your friend both have rational beliefs about upcoming baseball
games in Puerto Rico. However, your beliefs are based on ignorance, whereas
your friend’s beliefs are based on the knowledge that he has acquired as a fan
of Puerto Rican baseball. Your friend tells you what his beliefs are about the
upcoming games. But, you say to yourself, “I am as smart as my friend. Why
should I give up my beliefs and adopt his beliefs as my own?” So, you keep your
old beliefs. But, then, you think some more. You know what your friend’s beliefs
are rational. So, if you were to adopt his beliefs, your new beliefs would not
be irrational. Furthermore, your friend’s beliefs are presumably more accurate
than your own. Therefore, in order that your beliefs could be more accurate, you
change your mind and adopt your friend’s beliefs as your own.
Unfortunately, by epistemic Bayesian standards, you changed your beliefs in
an irrational way. What was irrational about your belief change? Notice that
your change of mind was not prompted by any new information. Earlier, when
you learned what your friend’s beliefs were, you decided not to change your own
beliefs. It was only later that you changed your beliefs. And that belief change
was prompted by a desire for greater accuracy in your beliefs, not because you
had learned anything new.
Not receiving new information is the same as receiving information of the
occurrence of the universal/sure event Ω. If you are rational by epistemic
Bayesian standards, when you are informed of Ω, you should change your belief
in any B from Pr(B) to

PrΩ (B) = Pr(B|Ω) = Pr(B).

In other word, there should be no change in your beliefs. (This can be demon-
strated by a diachronic Dutch book argument that I will not present here.) Had
Statisticians Should Not Tell Scientists What to Think 71

your change of mind occurred while you were gambling, the House could have
made you suffer a sure loss.
The irony here is that, by changing your mind so that your new belief would
be more accurate/realistic, your new belief became inconsistent with your old
belief. And, thus, your belief change was irrational by epistemic Bayesian stan-
dards. The vice of irrational belief change outweighs the virtue of increased
accuracy of belief.
Is There Room for New Ideas? Science needs to evaluate new ideas. For example,
at one time, the idea of an Ice Age was a new idea. So, let B denote the idea
that there was once a sheet of ice, thousands of meters thick, covering most
of Canada and northern United States [23, pp. 12–14]. Hundreds of years ago,
this idea would not have occurred to most people. Presumably, for such people,
Pr(B) would be zero. But, then there is no evidence A (that is credible in the
sense of having Pr(A) > 0) that can cause Pr(B|A) to be greater than zero. This
is because:
Pr(A ∩ B) Pr(B) 0
Pr(B|A) = ≤ = = 0. (6)
Pr(A) Pr(A) Pr(A)
How then can epistemic Bayesianism explain how unthought-of ideas can ratio-
nally come to be believed?
The problem with epistemic Bayesianism is that it envisions only one way
of forming new beliefs, which is by obtaining new information and conditioning
one’s current beliefs on the new information. This process repeats again and
again as new information is acquired.
The Ball-and-Chain of Old Beliefs. To summarize: Scientists frequently need to
change their beliefs, sometimes radically. But, in the epistemic Bayesian depic-
tion of rational science, scientists’ old beliefs are a like ball-and-chain that keep
them from changing their position much.

4 Probability in the Pragmatic Bayesian School of


Statistics
Pragmatic Bayesianism is very different from epistemic Bayesianism. Unlike the
latter school of thought, the former school regards it as unimportant to precisely
specify what probability is or to precisely specify a theory of statistical inference.
To illustrate the attitudes of the pragmatic Bayesian school, I shall use the
textbook Bayesian Data Analysis, Third Edition [14], the title to be abbreviated
here BDA3. This is an excellent book. It won the 2016 DeGroot Prize from the
International Society for Bayesian Analysis. For descriptions of various Bayesian
analyses and explanations of how to carry them out, the book is superb. Per-
sonally, I have found it helpful. It is my go-to book for Bayesian statistical
methodology.
The authors of BDA3 state on p. 4 that, rather than “argue the foundations
of statistics”, they prefer to concentrate on the“pragmatic advantages of the
Bayesian framework”. The problem that this statement poses for many readers
is this: In the philosophy of pragmatism, one judges the value of a method by
72 D. Bamber

the success that it achieves. But, what is “success”? One of the purposes of
the foundations of statistics is to formulate what it is that statistical inference
attempts to achieve. If we are not told what Bayesian statistics attempts to
achieve, how we can judge whether it has been successful?
BDA3’s View of Probability. BDA3 states (pp. 11–13) that probability is used
to measure uncertainty and several kinds of uncertainty are described. The book
does not restrict the application of probability to one kind of uncertainty only.
And that lack of restriction seems to be the point: A pragmatic-Bayesian statis-
tical model may include multiple kinds of uncertainty.
BDA3’s Approach to Statistical Methodology. For all its excellence, BDA3 is a
puzzling book. Although good at explaining how to do a Bayesian analysis, it
leaves unclear why you would want to do a Bayesian analysis. Of the many
different kinds of statistical analysis that you might do, why would you want to
condition your model on the data? BDA3 doesn’t answer that question.
In contrast to BDA3, epistemic Bayesians can tell you why, in their view,
you should do Bayesian statistical analyses; the reason being that a Bayesian
analysis in which you condition your beliefs on new data is the only rational way
to update your beliefs. But that explanation is not found in BDA3.
Aside. To do Bayesian statistical analyses, one needs a broader concept of con-
ditioning than has been discussed here. For a broad treatment of conditioning,
see Chap. 5 of Pollard’s textbook on probability theory [19]; particularly rele-
vant to Bayesian statistics is Theorem <12> and the corollary Example <13>.
(To read Pollard’s book, however, you need to learn his unusual de-Finetti style
notation.) And, for finite-dimensional spaces, see the series of four papers written
by I.R. Goodman with input from colleagues [7].
We might surmise that pragmatic Bayesians agree with epistemic Bayesians
(without saying so) that the reason for doing Bayesian statistical analyses is that
such analyses are the only way to rationally update one’s beliefs. But that sur-
mise is questionable. Pragmatic Bayesians sometimes appear to be unconcerned
with whether belief change is rational.
Concern for Accuracy of Models. If pragmatic Bayesians were concerned only
with rationally updating their beliefs, they would stop further examination of
their models once they had computed the model’s posterior. But they do not
stop there. At the very beginning of BDA3 (Sect. 1.1), it is stated that there is
a further step involved in doing Bayesian statistics. After computing a model’s
posterior, one needs to evaluate how well the model fits the data. This is called
model checking and is the subject of Chap. 6 in BDA3. The purpose of model
checking is to look for deficiencies in the model so that the model can be modified
to correct those deficiencies.
By the standards of epistemic Bayesianism, it is irrational to modify a model
because a model check has indicated a deficiency in the model. The epistemic
Bayesian viewpoint is that, once the modeler has computed a model’s posterior,
the modeler has rationally updated his/her probabilities. To modify the model
a second time because a model check shows it to be deficient is irrational (by
epistemic Bayesian standards).
Statisticians Should Not Tell Scientists What to Think 73

To summarize: Pragmatic Bayesians engage in two basic practices:


1. Compute model posteriors by conditioning on data. (That is rational by epis-
temic Bayesian standards.)
2. Check models to see how well they fit the data and, then, modify deficient
models. (That’s irrational by epistemic Bayesian standards.)
How do Pragmatic Bayesians Justify Their Practices? That is quite unclear.
Perhaps the reason for that lack of clarity is that there is not a consensus in the
pragmatic Bayesian community. They may agree on what the proper practice is,
but not on how to justify that practice.
A Justification. In Sect. 6, I present an approach to modeling that justifies the
practices of pragmatic Bayesians. However, in this approach to modeling, prob-
ability is interpreted differently than in either the pragmatic or the epistemic
Bayesian communities. In this approach, probability describes variability, not
uncertainty.
Aside. It is sometimes reasonable to be fairly unconcerned with how probability
is to be interpreted when doing a Bayesian analysis. Thus, my colleagues and
I have used a Bayesian second-order probability approach to defining a non-
monotonic logic [3,8–10]. The resulting logic makes sense under more than one
interpretation of probability.

5 Models as Hypotheses About Natural Mechanisms


In much of statistics, probability is conceived of as being a measure of uncertainty
[22, p. 3]. But, in much of science, there is a different school of thought in which
probability distributions are used to describe the variability of Nature’s behavior.

5.1 Natural Mechanisms


Much of science (but not all science) is concerned with formulating models of
the mechanisms by which Nature operates. Examples: (a) The mechanism of
Newtonian gravity that causes planets to have Keplerian orbits. (b) The mecha-
nism of disease spread. Some models of these mechanisms are deterministic (e.g.,
Newtonian gravity), but others are stochastic (e.g., the spread of disease).
A Simple Model of a Stochastic Mechanism. As a particularly simple example of
a stochastic model of a natural mechanism, I present a model of human paired-
associate learning described in [2, Chap. 3] based on an experiment and theoret-
ical work of Bower [11]. I’ll call this model the “all-or-none learning model”. (It
is also known as the “one-element model”.) In the experiment, the subjects (who
were college students) were shown paired associates consisting of two consonants
paired with either the number 1 or the number 2 (e.g., mk-2 ). The subjects
were to do their best to learn these pairs. Later, when shown the left member of
a pair (e.g., mk ), the subject was supposed to respond with the right member
of the pair (i.e., 2 ). Subjects were presented with the pairs, one at a time, again
and again until the pairs had been learned to a criterion.
74 D. Bamber

Human learning is a natural phenomenon that has been brought into the
laboratory by Bower’s experiment. The all-or-none learning model is a hypothesis
about the natural mechanism underlying paired-associate learning in Bower’s
experiment. The model makes the following assumptions for each subject and
each paired associate.

1. At each trial, the paired associate was in either a learned state or an unlearned
state.
2. At the start of the experiment, each paired associate was in the unlearned
state.
3. Once in the learned state, the paired associate stayed in that state.
4. On each trial, the paired associate, if currently unlearned, had a probability
c of being learned.
5. When an learned paired associate was tested, the subject would always
respond with the correct response.
6. When an unlearned paired associate was tested, the subject would guess the
correct response with probability 1/2.
7. In addition, appropriate conditional independence assumptions were made.

To reiterate: The above assumptions constitute a hypothesis about the natural


mechanism underlying paired-associate learning.
In the data analysis, the single parameter c was estimated from the data
and the distributions of various statistics were derived from the model. E.g.,
the probability of an error on trial n, the total number of errors, trial number
of last error, and length of runs of errors. Graphs comparing predictions and
observations were made [2, Sect. 3.3]; eyeball inspection of those graphs showed
impressive fits. However, some later experiments showed some deviations from
the all-or-none learning model [2, Sect. 3.5].

5.2 Probability as a Description of Natural Variability

In the all-or-none learning model, probability was used as a way of describing


the variability of Nature. Thus, on some trials where a paired-associate was in
the unlearned state, it would be learned on that trial; on other trials, it would
not be learned. To describe Nature’s variability, the model states that learning
occurs with probability c, and no learning with probability 1 − c.
More generally, there is a long tradition in experimental psychology of using
probability to describe natural variability [2, Sect. 1.3].
Natural Propensities. What might we mean when we say that probability
describes natural variability? Let’s go back to the example of a trial where a
paired associate is in an unlearned state at the start of the trial. The all-or-none
learning model assumes that, during the trial, the paired associate is learned
with probability c and is not learned with probability 1 − c. Call this the learn-
ing assumption.
One interpretation of the learning assumption is that Nature does not behave
deterministically; instead Nature has a propensity, describable by probability, to
Statisticians Should Not Tell Scientists What to Think 75

sometimes behave one way and sometimes another. (Philosophers of science have
formulated a number of theories of propensity: see [15, Chaps. 6 & 7] and [12,
pp. 76–77].)

5.2.1 A Rejected Reinterpretation


Could We Not Reinterpret Probabilities Descriptive of Nature as Degrees of
Belief ? There is a rationale for doing so. If a modeler posits that learning will
occur with probability c, then the modeler is uncertain whether learning will
occur. And it would be reasonable to infer that the modeler has degree of belief
c that leaning will occur. In this way, probabilities could become measures of
uncertainty, consonant with the thinking of many statisticians.
Doing exactly this is appealing to many of my colleagues. They would inter-
pret all the probabilities in the all-or-none learning model as degrees of belief.
In addition, they would adopt a prior distribution for the learning parameter c,
thus producing a full epistemic Bayesian model.
Personally, I find this course thoroughly unappealing. Let’s suppose that I
am the modeler. A model that started out as a description of a natural phe-
nomenon has become a description of my beliefs. But, my degrees of belief are
operationalized in terms of what bets I would make and not make. Thus, a model
that started out as a description of a natural phenomenon has become a descrip-
tion of my gambling behavior. Personally, I think that a description of Nature
is interesting, whereas a description of my gambling behavior is boring.

6 Models as Mimics of Natural Mechanisms

In this section, I advocate an approach, that I have named mimetic modeling,


to the stochastic modeling of natural phenomena.

6.1 At Best, Models only Approximate Natural Mechanisms

Among modelers who are attempting to understand natural mechanisms, most


do not regard their models as being absolutely correct descriptions of natural
mechanism and impossible to improve. That is certainly the attitude expressed
toward the all-or-none learning model by the authors of [2] in their Sect. 3.5. So,
let us consider that model. Is it plausible that model could be a true and complete
description of the natural mechanism of paired-associate learning? No, it’s not.
However, paired-associate learning occurs, it involves huge numbers of neurons
in complex networks. It simply isn’t plausible that a simple model could be an
accurate and complete description of a reality that is almost certainly extremely
complex. So, taken as a description of reality, the all-or-none learning model is
almost certainly wrong.
Furthermore, in most domains of science, models are not absolutely correct
descriptions of natural mechanisms. Most models are almost certainly wrong.
76 D. Bamber

6.2 Overview of Mimetic Modeling


Ideally, we would like to know what the natural mechanisms underlying natural
phenomena are. However, most natural phenomena are so complex that we can
never hope to fully and accurately describe them in our models. Consequently,
we need another way to think about what it is that models can realistically hope
to achieve.
In mimetic modeling [5,6], the goal is not to describe the natural mechanism
underlying a natural phenomenon. Rather our goal is to design an artificial
mechanism that mimics the behavior of the natural phenomenon.
Fortunately, when we take up mimetic modeling, we don’t have to throw
away all our old models that had been intended to be descriptions of natural
mechanisms. Instead, we can reinterpret those models as designs for artificial
mechanisms. In particular, the all-or-none learning model [2,11] can be reinter-
preted in that way.

6.2.1 Mimickers and Models


Let us begin with a very general explanation of mimickers and models. More
detail will be given later. A mimicker is an artificial probabilistic mechanism
that produces behavior; it is something that we build. A mimetic model is a
design for building a mimicker; it specifies the probabilities of the mimicker’s
behaviors. We don’t have to build the mimicker and run it to know how it will
behave. If we have the model for the mimicker (i.e., if we have the design of its
mechanism), we can predict the mimicker’s behavior using probability theory.
Interpretation of Probability in Mimickers. The behavior of mimickers is designed
to be variable. More particularly, a mimicker has been designed to have a propen-
sity, describable by probability, to sometimes behave one way and sometimes
another. Because a mimicker is a built device, such a propensity is an engineered
propensity.
The Goal of Mimicry. A mimicker for a natural phenomenon is an artificial prob-
abilistic mechanism whose behavior (it is hoped) will mimic the behavior of the
natural phenomenon. To be more specific, when we speak of the behavior of a
mimicker, we mean the artificial data produced by the mimicker. It is hoped
that the artificial data produced by the mimicker will resemble the empirical
data produced by the natural phenomenon. A mimicker may mimic well or it
may mimic poorly. We hope that the behavior of the mimicker approximates the
behavior of the natural phenomenon, but it may not. The better the approxima-
tion, the better the mimicker and, thus, the better the design of the mimicker
(i.e., the better the model).
When we propose a design for a mimicker of a natural phenomenon, we are
not making a claim that the artificial mechanism in the mimicker matches the
natural mechanism underlying the natural phenomenon. In particular, although
the mimicker is a stochastic mechanism, we do not claim that the natural mech-
anism is stochastic. It might be stochastic; or it might be deterministic, or even
chaotic. When we design a mimicker, all we care about is that its behavior should
approximate the behavior of the natural phenomenon.
Statisticians Should Not Tell Scientists What to Think 77

When models are conceived of as hypotheses about natural mechanisms,


strictly speaking, those hypotheses are either correct or incorrect. And, if we
just collect enough data, virtually every such hypothesis can be discredited.
On the other hand, mimetic models are neither correct nor incorrect. Rather
they approximate the behavior of natural phenomena more or less well. How,
then, should we evaluate them?
Consumer-Magazine Paradigm for Model Evaluation. Various magazines are
published that evaluate products to help consumers decide which products to
buy. Consider automobiles, for example. A consumer magazine might evaluate
various automobiles on criteria such as: distance traveled on a tank of gas, num-
ber of passenger seats, space available for cargo, etc. The magazine does not
specify which car is best, because different consumers have different needs. For
example, one consumer might want more seats in a car, but another consumer
might want to minimize the frequency of fueling. As a result, different consumers
will buy different cars.
Analogously, different scientists may have different goals for a mimetic model
of a natural phenomenon. One scientist might want the model to mimic one
statistic well; the other might want a different statistic to be mimicked well.
This might lead the two scientists to differently evaluate the adequacy of the
mimetic model.
In addition, a scientist might regard, as unimportant, a statistically signif-
icant deviation of a model’s behavior from Nature’s behavior. If the model’s
behavior showed only a small deviation from Nature’s behavior and if no better
model was known, the scientist might regard the model as “good enough”, even
though the small deviation was highly significant.
Empirical Distinguishability of Models. For a given experiment, two dissimilar
models might mimic Nature’s behavior about equally well. To empirically dis-
tinguish these models, the challenge for the experimenter is to design a new
experiment in which Nature’s behavior is mimicked well by one model, but not
the other. For example, in the study of the temporal aspects of human cognition,
one of the goals has been to determine whether cognitive processes occur serially
or in parallel. It has been surprisingly difficult to design experiments that can
distinguish serial processing from parallel processing; considerable ingenuity has
been needed to solve this problem [21].
Subjectivity. As described above, there can be considerable subjectivity in a sci-
entist’s evaluation of a mimetic model. However, this is subjectivity of judgment,
rather than the subjectivity of belief found in epistemic Bayesian modeling.
Uncertainty. The purpose of mimetic models is to imitate natural phenomena;
such models are not intended to be descriptions of a scientists’ uncertainties. For
example, a scientist might have doubts about whether a scientific instrument
measures what it purports to measure. Such an issue would be dealt with by
the scientist exercising scientific judgment and not by statistically modeling the
scientists’s uncertainties. In mimetic modeling, it is Nature that is modeled, not
the scientist.
78 D. Bamber

6.3 Parameters in Mimetic Modeling

Up to now, our discussion of mimetic models has been at such a high level that
there has been no discussion of parameters in mimetic models. We will now
discuss the role of parameters in mimetic modeling. We will show how, without
inconsistency, both classical and Bayesian statistical analyses can be applied to
mimetic models.

6.3.1 Classical Statistical Methods Applied to Mimetic Models


A mimetic model with parameters may be regarded either as (a) partially-
specified model or as (b) a family of fully-specified models, with one fully-
specified model assigned to each possible value of the parameter vector. The
latter is the point of view taken here. When empirically evaluating a paramet-
ric mimetic model, one may use classical statistical methods to estimate the
parameter vector and, then, predictions may be derived from the fully-specified
model assigned to that parameter vector. Those predictions can then be com-
pared with the empirical observations. As an example: Eyeball comparison of
predictions and observations was the way that the all-or-none learning model
was evaluated [2, Sect. 3.3].

6.3.2 Bayesian Statistical Methods Applied to Mimetic Models


Parallel streams. As just described, a mimetic model with a parameter vector
may be regarded as a family of fully-specified models that are each assigned to
a parameter vector. Now each fully-specified model is a design for a mimicker,
i.e., an artificial stochastic mechanism that generates a vector (or, speaking
figuratively, a stream) of artificial data. So, at this point, we have a family of
mimickers that generate parallel streams of artificial data. We call such a family
a parametric family of mimickers.
Random Choice of Data Stream. Out of this family of parallel data streams, we
want to extract just one stream. We do that by choosing one stream at random.
Specifically, we adopt a distribution over the space of parameter vectors and then
randomly select a parameter vector from that distribution. Following Bayesian
terminology, we call this distribution of parameter vectors the prior distribution.
Prior-Equipped Mimicker. From a parametric family of mimickers, we design
an amalgamated mimicker as follows. First, the amalgamated mimicker ran-
domly selects a parameter vector from the prior distribution. This parameter
vector “points at” one of the fully-specified mimickers. Then a stream of data
is generated from the“pointed-at” (fully-specified) mimicker. We call such an
amalgamated mimicker a prior-equipped mimicker.
Mimicking a Two-Stage Experiment. Suppose that we have done a two-stage
experiment investigating a natural phenomenon. There are two vectors of obser-
vations: one from the first stage and one from the second stage.
Suppose that we construct a prior-equipped mimicker designed to imitate the
results of the experiment. Let θ denote the mimicker’s parameter vector and let
y1 and y2 denote the first- and second-stage outputs from the mimicker. Using an
Statisticians Should Not Tell Scientists What to Think 79

informal style of notation found in [14, pp. 6–8], we may express the three-way
joint probability density of θ, y1 , and y2 as:

p(θ, y1 , y2 ) = p(θ) p(y1 |θ) p(y2 |θ), (7)

where p(θ) denotes the prior density of θ, where p(y1 |θ) and p(y2 |θ) denote
the densities of y1 and y2 conditioned on θ, and where the mimicker has been
designed so that y1 and y2 are conditionally independent given θ.
Mimicking the Two Stages Jointly. From (7), the joint density of (y1 , y2 ) is:

p(y1 , y2 ) = p(y2 |θ) p(y1 |θ) p(θ) dθ. (8)

Mimicking Just the Second Stage. From (8), the density of y2 is:

p(y2 ) = p(y2 |θ) p(θ) dθ. (9)

Mimicking the Second Stage Conditioned on the Output of the First. From (8),
the conditional density of y2 given the value of y1 is:
   
p(y1 |θ) p(θ)
p(y2 |y1 ) = p(y2 |θ) dθ = p(y2 |θ) p(θ|y1 ) dθ. (10)
p(y1 )
Remark. In (9) and in (10), the unconditional density p(y2 ) and the conditional
density p(y2 |y1 ) are expressed as weighted integrals of p(y2 |θ) with weights given
in the former case by the prior density p(θ) and, in the latter case, by the so-
called posterior density p(θ|y1 ). In standard Bayesian terminology, it is said that
the posterior density p(θ|y1 ) is obtained through “updating” the prior density
p(θ) by conditioning on the first-stage data y1 .
Continuation of the Mimicker’s Behavior from the First Stage to the Second. In
(10), p(y2 |y1 ) is the distribution of the mimicker’s second-stage output condi-
tional on its output from the first stage. It is the mimetic analog of the epistemic
concept of the posterior predictive distribution of y2 given y1 [14, Eq. 1.4].
Accuracy of Mimicking. Suppose that we have done a two-stage experiment with
empirical results y1emp and y2emp from the two stages. If a mimicker does a good
job of mimicking the results of the experiment, then y2emp will look like it could
plausibly come from the distribution with density p(y2 |y1emp ). But that may not
look plausible if the mimicker is not good.

6.4 Model Checking and Modification


One of the chief concerns of scientists is how well their model fits the data.
Checking model fit can be carried out by classical statistical methods as briefly
described above in Sect. 6.3.1. Or, model checking can be carried out using the
more general pragmatic-Bayesian methods described in BDA3 [14, Chap. 6].
The purpose of model checking is two-fold: The first purpose is to evaluate
whether the model needs to be modified to better fit the data. The second
purpose is to get some clues as to what kinds of modifications should be made.
80 D. Bamber

Now, in epistemic Bayesianism (as opposed to pragmatic Bayesianism), once


a model has been updated by conditioning on data, it is irrational to further
modify the model so that it better fits the data.
In contrast, in mimetic modeling, it is not at all irrational to modify a model
so that it better fits data. In mimetic modeling, a model is an engineering pro-
posal for building a device that, it is hoped, will do a good job of mimicking a
natural phenomenon. If one such engineering proposal doesn’t work well, it isn’t
irrational to formulate another engineering proposal—in fact, it makes good
sense to do so.

7 “Dictatorial Statistics”: Its Avoidance via Mimetic


Modeling

There are two schools of thought in statistics that, in my opinion, improperly


dictate to scientists what their data forces them to think:

• The first school advocates either accepting or rejecting a statistical hypothesis


on the basis of a statistical test. Actually, what I am calling a school is really
a set of three related, but warring, schools: the Fisher school, the Neyman-
Pearson school and the null-hypothesis significance-testing school [18]. By
these schools’ own analyses, the conclusion to accept a statistical hypothesis
may be in error. And, likewise, the conclusion to reject a statistical hypothesis
may be in error. So, it doesn’t make sense to regard a scientific question as
being settled on the basis of a statistical test.
• The second school is epistemic Bayesianism. That school says that, once a
model has been updated by conditioning on data, it is irrational to further
modify the model. In particular, it is irrational to modify the model so that
it better fits the data.

Both these problems of statistics usurping scientific judgment are avoided


when mimetic modeling is employed.
• First, mimetic modeling is not devoted to deciding whether a mimicker is
correct or wrong. When we propose a mimicker for a natural mechanism, we
know that realistically the mimicker cannot hope to capture the full complex-
ity of the natural mechanism. So, there is no point in evaluating whether the
mimicker is correct or wrong—we already know it’s wrong. Instead, mimetic
modeling aims at evaluating whether the mimicker does a good job or a poor
job of mimicking the natural phenomenon.
• Second, unlike epistemic modeling that is constrained by the “ball-and-chain”
of old beliefs, there is no problem in mimetic modeling with further modifying
a model that has already been updated by conditioning on data.
Statisticians Should Not Tell Scientists What to Think 81

8 A Final Word
I end with a quote from the statistician Francis J. Anscombe [1]. He wrote:
The subject of statistics is itself subtle and puzzling, whereas textbooks
try to persuade the reader that all is clear and straightforward.
I heartily agree.

Acknowledgments. This paper is dedicated to the memory of William H. Batchelder,


who often helped me by giving me insightful comments on my ideas. I have also bene-
fitted from talking with and hearing the perspectives of Richard Chechile, Michael D.
Lee, Richard Shiffrin, and Philip L. Smith.

References
1. Anscombe, F.J.: Fisher’s ideas. Science 210, 180 (1980)
2. Atkinson, R.C., Bower, G.H., Crothers, E.J.: An Introduction to Mathematical
Learning Theory. Wiley, New York (1965)
3. Bamber, D.: Entailment with near surety of scaled assertions of high conditional
probability. J. Philos. Log. 29, 1–74 (2000)
4. Bamber, D.: What is probability? (Review of the book [15].) J. Math. Psychol. 47,
377–382 (2003)
5. Bamber, D.: Two interpretations of Bayesian statistical analyses. Unpublished talk
given at the 54th meeting of the Edwards Bayesian Research Conference, Fullerton,
California, April 2016
6. Bamber, D.: Bayes without beliefs: Mimetic interpretations of Bayesian statistics in
science. Unpublished talk given at the 49th meeting of the Society for Mathematical
Psychology, New Brunswick, New Jersey, August 2016
7. Bamber, D., Goodman, I.R., Gupta, A.K., Nguyen, H.T.: Use of the global implicit
function theorem to induce singular conditional distributions on surfaces in n
dimensions. Random Operators and Stochastic Equations. Part I. 18: 355–389.
Part II. 19: 1–43. Part III. 19: 217–265. Part IV. 19: 327–359 (2010/2011)
8. Bamber, D., Goodman, I.R., Nguyen, H.T.: Deduction from conditional knowledge.
Soft Comput. 8, 247–255 (2004)
9. Bamber, D., Goodman, I.R., Nguyen, H.T.: Robust reasoning with rules that have
exceptions: from second-order probability to argumentation via upper envelopes
of probability and possibility plus directed graphs. Ann. Math. Artif. Intell. 45,
83–171 (2005)
10. Bamber, D., Goodman, I.R., Nguyen, H.T.: High-probability logic and inheritance.
In: Houpt, J.W., Blaha, L.M. (eds.) Mathematical Models of Perception and Cog-
nition: A Festschrift for James T. Townsend, vol. 1, pp. 13–36. Psychology Press,
New York (2016)
11. Bower, G.H.: Application of a model to paired-associate learning. Psychometrika
26, 255–280 (1961)
12. Diaconis, P., Skyrms, B.: Ten Great Ideas About Chance. Princeton University
Press, Princeton (2018)
13. Efron, B.: Why isn’t everyone a Bayesian? Am. Stat. 40, 1–5 (1986)
14. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.:
Bayesian Data Analysis, 3rd edn. CRC Press, Boca Raton, Florida (2013)
82 D. Bamber

15. Gillies, D.: Philosophical Theories of Probability. Routledge, London (2000)


16. Kolmogorov, A.N.: Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer,
Berlin (1933). (English translation by N. Morrison, Foundations of the Theory
of Probability. Chelsea, NewYork (1956)
17. Lindley, D.V.: Comment (on [13]). Am. Stat. 40, 6–7 (1986)
18. Perezgonzalez, J.D.: Fisher, Neyman-Pearson or NHST? a tutorial for teaching
data testing. Front. Psychol. 6, Article 223 (2015). https://doi.org/10.3389/fpsyg.
2015.00223
19. Pollard, D.: A User’s Guide to Measure Theoretic Probability. Cambridge Univer-
sity Press, Cambridge (2002)
20. Savage, L.J.: The Foundations of Statistics. Wiley, New York (1954). (Second
revised edition, Dover Publications, New York (1972)
21. Townsend, J.T., Wenger, M.J.: The serial-parallel dilemma: a case study in a link-
age of theory and method. Psychon. Bull. Rev. 11, 391–418 (2004)
22. Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference.
Springer, New York (2004)
23. Woodward, J.: The Ice Age: A Very Short Introduction. Oxford University Press,
Oxford (2014)
Bayesian Modelling Structural Changes
on Housing Price Dynamics

Hong Than-Thi1 , Manh Cuong Dong2 , and Cathy W. S. Chen1(B)


1
Department of Statistics, Feng Chia University, Taichung, Taiwan
{tthong,chenws}@mail.fcu.edu.tw
2
Department of Economics, Feng Chia University, Taichung, Taiwan
cuonghay@gmail.com

Abstract. This paper examines the impact of the inflation rate and
interest rates on housing price dynamics in the U.S. and U.K. hous-
ing markets covering the period of 1991 to 2018. We detect structural
changes based on autoregressive models having exogenous inputs (ARX)
with GARCH-type errors via Bayesian methods. This study conducts
a Bayesian model comparison among three scenario structural-change
models by designing an adaptive Markov chain Monte Carlo sampling
scheme. The results from the deviance information criterion show that
ARX-GARCH models with two structural changes are preferable over
those with no/one structural change in both countries. The estimated
locations of breakpoints in the housing returns are dissimilar when we use
different exogenous variables, thus asserting the importance and neces-
sity of considering structural changes in housing markets. Bayesian esti-
mation results further reveal the different impacts of interest rates and
the inflation rate on the housing returns in each market. More specifi-
cally, the inflation rate has a negative impact on the U.S. housing market
in an economic downturn (including the global financial crisis), but no
strong relationship for the other periods and other exogenous variables.
Conversely, we note that interest rates have a reverse influence on the
U.K. housing market in a recession only and are insignificant in other
periods and other exogenous inputs. The results are consistent in one
aspect, whereby the house prices are more sensitive during the reces-
sion era.

Keywords: Deviance information criterion (DIC) · House prices


Interest rates · Inflation rate · MCMC method
Segmented ARX-GARCH model

1 Introduction
In modern days the national housing markets have tremendous effects on
economies. The development of a healthy housing sector can spur an economy
through higher aggregate expenditures, job creation and housing turnover. The

c Springer Nature Switzerland AG 2019


V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 83–104, 2019.
https://doi.org/10.1007/978-3-030-04263-9_6
84 H. Than-Thi et al.

construction industry can also stimulate demand in related industries such as


household durables.
A house is the most valuable thing many people may ever own, and house
prices strongly correlate with household borrowing and consumption over the
business cycle (Cloyne et al. [18], Miller, Peng, and Sklarz [31]). Most studies in
the literature point out the influence of house prices on consumption through two
main effects: the wealth effect and the collateral effect. These two effects can be
summarized as when house prices go up, homeowners become better off and feel
more confident about their finance. Some people borrow more against the value
of their home, to spend on goods and services, renovate their house, supplement
their pension, or pay off other debt. When house prices go down, homeowners
run the risk that their house will be worth less than their outstanding mortgage.
When this occurs, people are more likely to cut down on spending and hold off
from making personal investments (Aoki, Proudman, and Vlieghe [5], Bajari,
Benkard and Krainer [7], Belsky and Joel [8], Buiter [12]).
Since household consumption is an important part of the economy, house
prices are also a key driver for the whole economy and a significant influence
in a recession (Case, Quigley, and Shiller [13], Mian, Rao, and Sufi [28], Mian
and Sufi [29]), with the 2007–2008 global financial crisis as a typical example
for this. Aside from influencing consumption, house prices also take on other
key roles; for instance, mortgage markets are important in the transmission of
monetary policy, or adequate house prices can facilitate labor mobility within
an economy and help economies adjust to adverse shocks (Zhu [41]). Another
aspect of their importance is their close relationship to the banking sector in
which when housing prices drop, lenders are more likely to default on their
home loans, causing banks to lose money. Many people (mostly senior people)
have used their house as a lifetime mortgage, which is a type of loan that you do
not have to make any repayments before the end of the plan, but the lender will
collect debt from the sale of the property after the person dies or goes into long-
term care. Under this situation, the precise valuation and prediction of house
prices are very crucial, because any big changes in prices will lead to a great loss
for either the lender or borrower (Longstaff [26]).
Many studies have noted the momentum factor of house prices upon an econ-
omy. Therefore, many researchers attempt to predict house prices, look for the
influential factors that have the ability to drive them, or try to detect the over-
valuation of a housing prices market (Tsatsaronis and Zhu [38], Englund and
Ioannides [23], Van-Nieuwerburgh and Weill [39]). Our paper enriches the litera-
ture in this field by focusing on two of the most important factors affecting house
prices as largely agreed upon by many studies: interest rates and inflation rate
(Dougherty and Order [20], Englund and Ioannides [23], McQuinn and O’Reilly
Bayesian Modelling Structural Changes on Housing Price Dynamics 85

[27]). Compared to other research, we not only examine the influence of those
two factors on the housing market in a general way, but also investigate the phe-
nomenon that housing prices usually increase into a new benchmark and rarely
come back to their old levels; in other words, it is very likely that structural
breaks exist in these prices.
There is a stream of literature that addresses the issue of parameter insta-
bility by estimating regime-switching models and by searching for structural
breaks under a predictive relation between equity returns and explanatory vari-
ables (Pesaran and Timmermann [33], Stock and Watson [35], Brown, Song, and
McGillivray [11], Dong et al. [19]). Several studies present that housing prices
experiences many shocks in the past. It is thus very worthwhile to detect any
structural breaks in a housing market and to study in greater detail each regime
divided by break points (Pain and Westaway [32], Andrew and Meen [3]). More-
over, as Miles [30] highlights, failure to incorporate volatility clustering may
lead to an inaccurate modeling of home prices. Many studies have employed the
autoregressive conditional heteroskedastic (ARCH) family of models, initiated
by Engle [22] and Bollerslev [9], to model dynamic volatility.
To cope with the aforementioned characteristics’ impact on housing prices,
such as autoregression, exogenous factors, structure changes, volatility cluster-
ing, and parameter change, we propose modelling the housing price dynamics by
segmented autoregressive models with exogenous inputs (ARX) and GARCH-
type errors. This segmented ARX-GARCH model is known as piecewise ARX-
GARCH model, in which the boundaries between the segments are breakpoints.
One may interpret a breakpoint as a critical or threshold value beyond or below
which some effects occur. These breakpoints are important to policymakers. In
this study the exogenous variables are either inflation rate or interest rates.
Choosing the number of breakpoints for the segmented ARX-GARCH model
is very crucial. More precisely, we detect structural changes based on the ARX
model with GARCH-type errors and fit a segmented ARX-GARCH model simul-
taneously via Bayesian methods. Chen, Gerlach, and Liu [16] illustrate accu-
rate Bayesian estimation and inference based on a time-varying heteroskedastic
regression model, which allows for multiple structural changes. In this study we
examine the effect of exogenous variables on the proposed model. We choose
an optimal number of breakpoints using a Bayesian information criterion, i.e.
deviance information criterion (DIC) proposed by Spiegelhalter et al. [34]. DIC
is a Bayesian version or generalization of the famous Akaike Information Cri-
terion (AIC). Overall, the Bayesian structural change approach used herein is
able to detect the presence of breaks, determine the number of breaks, estimate
both the time of the occurrence and the parameter values around the time of
the breaks, and shows the influence of exogenous variables on the target series.
By applying the above mentioned method to monthly house prices in the U.S.
and the U.K. from January 1991 to February 2018 and utilizing two exogenous
variables (inflation rate and nominal interest rates) in each country, we illustrate
that the existence of structural change is reasonable and detect two breakpoints
in both housing markets, Pain and Westaway [32], Andrew and Meen [3] draw
86 H. Than-Thi et al.

the same conclusion on structural changes in housing markets. Moreover, the


detected breakpoints are dissimilar when we change exogenous variables, and
the influences of these two exogenous variables toward housing prices in each
regime (the time periods separated by the detected breakpoints) in each coun-
try are totally different, thus proving the essence of examining the structural
breaks model. Specifically, the inflation rate has a negative impact on the U.S.
housing market in an economic downturn (including the global financial crisis),
but no strong relationship for the other periods and other exogenous variables.
By contrast, we note that interest rates have a reverse influence on the U.K.
housing market in a recession only and are insignificant in other periods and
other exogenous inputs. The results are consistent in one aspect, whereby the
house prices are more sensitive during the recession era.
The rest of the paper runs as follows. Section 2 introduces the detection
of structural breaks in a segmented ARX-GARCH model. Section 3 describes
Bayesian inferences for the proposed model. Section 4 presents the data descrip-
tion used herein. Section 5 displays the results and our discussion. Section 6 pro-
vides concluding remarks.

2 The Structure Change Model

Many previous studies pay attention on the problem of structural breaks in


time series. Andrews [4] uses Wald, Lagrange multiplier, and likelihood ratio-like
tests for parameter instability and structural change with an unspecified num-
ber of breakpoints. Hansen [25] examines the breakpoints using the bootstrap
approach. To decide the number of breakpoints, Yao [40] exploits the Bayesian
information criterion (BIC), while Bai and Perron [6] propose a SupWald type
test. Elliott and Muller [21] suggest that the relationship between particular
variables may change substantially over time based on a J-test. However, those
studies focus on testing for the existence of structural breaks instead of analyzing
properties of the estimated breakdates, and they ignore both the autoregressive
component in returns and the conditional heteroskedasticity of asset returns,
which are often statistically significant and those that are missing may distort
the results. To overcome those drawbacks, Chen et al. [16] generalize the stan-
dard return prediction model subject to the presence of structural breaks, which
includes both AR and heteroskedastic components.
Under a similar concept, we incorporate structural changes in an autoregres-
sive model with exogenous inputs (ARX), while allowing the conditional vari-
ances to follow a GARCH model. We thus consider the segmented ARX-GARCH
Bayesian Modelling Structural Changes on Housing Price Dynamics 87

model, which can be expressed as:


⎧ (1)

⎪ φ rt−1 + ψ (1) xt−1 + at , t ≤ T1 ,


⎨φ(2) rt−1 + ψ (2) xt−1 + at , T1 < t ≤ T 2 ,
rt = . .

⎪ .. ..


⎩ (k+1)
φ rt−1 + ψ (k+1) xt−1 + at , Tk < t ≤ n,

at = ht εt ; εt ∼ N (0, 1), (1)
⎧ 
m 
s


(1)
α0 +
(1)
αi a2t−i +
(1)
βj ht−j , t ≤ T1 ,



⎪ i=1 j=1

⎪ 
m 
s

⎨ α0(2) + (2)
αi a2t−i +
(2)
βj ht−j , T1 < t ≤ T 2 ,
ht = i=1 j=1

⎪ .. ..

⎪ . .



⎪ m s

⎩ α0
(k+1)
+ αi
(k+1)
at−i +
2 (k+1)
βj ht−j , Tk < t ≤ n.
i=1 j=1

Here, rt is the asset return at time t; rt−1 is a 2-dimensional vector, (1, rt−1 ) ,
allowing an intercept and AR(1) term; xt−1 is a p-dimensional vector of
exogenous variables or leading indicators; φ(i) = (φ0 , φ1 ) and ψ (i) =
(i) (i)

(ψ1 , . . . , ψp ) are the corresponding vectors of AR and regression coefficients


(i) (i)

in each regime subject to k structural breaks occurring at times (T1 , T2 , . . . , Tk );


and volatility ht is recognized to be time-varying. This structure change model
in (1) is piece-wise linear in the space of a time zone.

3 Bayesian Inference
In order to make inferences and compare models, we estimate the unknown
parameters of model (1) in a Bayesian framework. Define θ = (φ , α , γ k )
as a set of all unknown parameters, where φ = (φ1 , . . . , φk+1 ) , φi =
(φ(i) , ψ (i) ) , γ k = (T1 , . . . , Tk ) , α = (α1 , . . . , αk+1 ) , and αi = (α0 ,
(i)

α1 , . . . , αm , β1 , . . . , βs ) . Suppose r1,t = (r1 , . . . , rt ) , we present the con-


(i) (i) (i) (i)

ditional likelihood function of model (1) by:

n


rt − μt
L (r2,n
|r1 , θ) = P √ × Iit , (2)
t=2
ht

where Iit is an indicator variable such that Iit = (Ti−1 < t < Ti ), i = 1, . . . , k +1,

T0 = 1, Tk+1 = n, and μt = E rt |r1,t−1 , xt−1 = φ(i) rt−1 + ψ (i) xt−1 . We con-
sider the following restrictions on the parameters to ensure covariance stationary
and positive variances:
m
 s

(i) (i) (i) (i) (i)
α0 > 0; αj , βj > 0 and αj + βj < 1, i = 1, · · · , k + 1. (3)
j=1 j=1
88 H. Than-Thi et al.

We now assume the AR and regression coefficients, φi , have a multivariate


Gaussian prior, N3 (φi0 , Σ i ), i = 1, . . . , k + 1 constrained for mean stationarity,
where φi0 = 0, and Σ i is a matrix with large numbers on the diagonal ele-
ments. The prior set-up of the breakpoint parameters γ k comes from the idea
of Chen et al. [16]; we employ a continuous but constrained uniform prior on
γ k , subsequently discretizing the estimates so that they become an actual time
index.
Without loss of generality, we explain how to set up a prior for breakpoints
with k = 2. The continuous versions are constrained in two ways: the first ensures
that T1 < T2 as required; while the second ensures that a sufficient sample size
exists in each regime for estimation. We assume priors for T1 and T2 are as
follows:
T1 ∼ Unif(a1 , b1 ) ; T2 |T1 ∼ Unif(a2 , b2 ),
where a1 and b1 are the 100hth and 100(1−2h)th percentiles of the set of integers
1, 2, . . . , n, respectively; e.g. h = 0.1 means that T1 ∈ (0.1n, 0.8n). Moreover, b2
is the 100(1 − h)th percentile of 1, 2, . . . , n and a2 = T1 + hn, so that at least
100h% of observations are in the range (T1 , T2 ). Consequently, the priors for T1
and T2 are uninformative and flat over the region, ensuring T1 < T2 and at least
100h% of observations are in each regime. We assume that αi follows a uniform
prior, p(αi ) ∝ I(Si ), for i = 1, · · · , 3, where Si is the set that satisfies (3).
The prior for each grouping of parameters, π(.), multiplied by the likelihood
function in (2) leads to the conditional posterior distributions. The conditional
posterior distributions for γ k , φi , and αi , i = 1, . . . , k + 1 have non-standard
forms. We therefore employ the Metropolis and MH (Metropolis et al., 1953;
Hastings, 1970) methods to draw the MCMC iterates for the γ k , φi and αi ,
i = 1, . . . , k +1 groups. We use an adaptive MH MCMC algorithm that combines
a random-walk Metropolis and an independent kernel MH algorithm to speed
up convergence and grant optimal mixing. Many studies in the literature have
successfully utilized this technique, e.g. Chen and So [17], Chen, Gerlach, and
Lin [14,15], and Gerlach, Chen, and Chan [24], etc. Chen and So [17] illustrate
the detailed procedures of random-walk Metropolis and an independent kernel
MH algorithm.

4 Data Description
In this study we use interest rates and inflation rate in turn as an exogenous
variable x in Eq. (1). The dataset consists of the monthly house price index,
interest rates, and consumer price index (CPI) in the U.S. and U.K. from January
1991 to February 2018. All data are from the Federal Reserve Bank of St. Louis
database. We calculate the percentage log returns for the housing market as
rt = (ln Pt − ln Pt−1 ) × 100, where rt is the percentage log returns at time t and
Pt is the house price index at time t. We calculate the inflation rate, subject to an
increase in the consumer price index, as it = ((CPIt − CPIt−1 ) /CPIt−1 ) × 100,
in which it is the inflation rate at month t and CPIt is the consumer price index
at time t.
Bayesian Modelling Structural Changes on Housing Price Dynamics 89

To provide a general understanding of the nature of each time series, Table 1


presents the summary statistics of all variables. In general, the U.K. has higher
average housing returns and average interest rate, while the U.S. has a higher
average inflation rate during the covered time period. The U.K. market seems to
be more variable and volatile than the U.S. market, since all of the U.K.’s three
variables have higher variance and standard deviation than those for the U.S.
These phenomena are also arise due to the fact that the U.K.’s variables have
more extreme values in min and max. The Augmented Dickey-Fuller test reveals
that the inflation rates in both countries are stationary, while housing returns
and interest rates in both markets are non-stationary. These results hence lend
a support to the necessity of examining the housing returns with a structural
break model. Our model includes the first-differenced monthly interest rates in
both countries, as those series are stationary and efficient to be an exogenous
variable in our model.

Table 1. Summary statistics of monthly housing returns, interest rates, and inflation
rates.

Country Series Mean Max Min Variance Unit root testa


U.S. Housing return 0.296 1.156 −1.752 0.208 0.366
Interest rate 2.970 7.170 0.110 5.143 0.438
b
Interest rate change −0.017 0.800 −1.960 0.051 0.010
Inflation rate 0.191 1.377 −1.771 0.066 0.010
U.K. Housing return 0.424 3.427 −3.161 0.866 0.128
Interest rate 4.233 13.857 0.298 8.852 0.310
Interest rate changeb −0.041 0.577 −1.760 0.055 0.010
Inflation rate 0.180 3.454 −0.964 0.147 0.020
a
P-value for Augmented Dickey-Fuller test.
b
The first differenced interest rate.

Figures 1, 2, and 3 exhibit the time plots of housing returns, the first differ-
enced interest rates, and inflation rates in the U.S. and U.K. markets, respec-
tively. From Fig. 1, we observe during the 2007–2009 global financial crisis period
that there are big drops in housing returns for both countries, but they then
recover afterwards. The first differenced interest rates in Fig. 2 also witnesses
a big decrease in both countries during the crisis. The U.K. also experienced
another shock in interest rates during the early 1990s. Both countries go through
their own inflation rate shock period as Fig. 3 presents, with the U.S. in 2007–
2008 crisis period and the U.K. in the recession of 1991–1992. Those big volatil-
ities in all three markets in both countries lend support for the necessity of
structural change detection in our study.
90 H. Than-Thi et al.

Fig. 1. Time plots of the U.S. monthly housing returns (upper panel) and U.K. monthly
housing returns (lower panel).

5 Results and Discussions


Motivated by Bollerslev, Chou, and Kroner [10], we consider a segmented ARX
with GARCH(1,1) volatility model in (1) since the GARCH(1,1) appears to
be sufficient to explain the volatility development for most return series. We
examine the effects of interest rates and inflation rate upon housing returns
separately in each housing market. This study uses DIC to determine the optimal
number of breakpoints in each ARX-GARCH model. The best model has the
smallest DIC value. A description of DIC procedure is referred to Truong, Chen,
and So [37].
In the MCMC framework, we set up the initial values for each parameter
as φi = (0.05, 0.05, 0.05) and αi = (0.1, 0.1, 0.1); the initial value for a single
breakpoint date is the median of the sample dates, and for the two breakpoints
Bayesian Modelling Structural Changes on Housing Price Dynamics 91

Fig. 2. Time plots of the first differenced monthly interest rates of the U.S. (upper
panel) and U.K. (lower panel).

the values are the 25rd and 75th percentile dates. We perform 20,000 MCMC
iterations and discard the first 10,000 iterates for each analyzed data series.
Table 2 presents the results of three scenario structural-change models, i.e.
DIC values of each ARX-GARCH model for both markets. We first propose
k = 0, 1, and 2 and use DIC to determine how many breakpoints are ideal in each
case. The table also reports the breakpoint locations detected by our model in
each case. We do not consider a model with more than two breakpoints, because
the sample size is quite small (the sample size equals 326 for each market).
The DIC results reveal that models without structural break (k = 0) are least
preferred and two structural breaks (k = 2) are most preferred in all cases.
The result is totally in accordance with our forecast of a structural change in
housing returns, lending support to the correctness of using the structural change
model. Figures 4 and 5 provide the housing returns in both countries along with
92 H. Than-Thi et al.

Fig. 3. Time plots of the inflation rate of the U.S. monthly CPI (upper panel) and the
U.K. monthly CPI (lower panel).

the breakpoint locations based on the most preferred model from Table 2. We
indicate the breakdates by vertical dashed lines.
For comparison, we also fit a two-breakpoint model with Student t inno-
vations to each housing series. The results do not show superiority among all
the competing models based on the DIC criterion. Therefore, we do not report
the estimates here to save space. Tables 3 and 4 establish the posterior means,
medians, standard errors, and 95% credible intervals according to Eq. (1) with
two breakpoints for both countries. We use interest rates and inflation rate sepa-
rately as the exogenous variable to examine their influences on the housing prices
in each regime (we call each time period divided by breakpoints a regime and
name them as regimes I, II, and III). When considering the interest rate case,
two detected breakdates are similar in both countries: June 2006 and January
2012. Regime II in this case seems to capture the full global financial crisis in
2007–2009, while regimes I and III capture more stable period of house prices.
Bayesian Modelling Structural Changes on Housing Price Dynamics 93

(i)
Our target coefficients ψ1 , i = 1, 2, and 3, show that the impact of interest
rates on house prices is unnoticeable in all three regimes for the U.S. market. In
contrast, the interest rates have a negative influence on the U.K. house prices in
(1) (3)
regime II (volatile regime), while the estimates of ψ1 and ψ1 are insignificant
when we consider regimes I and III.

Table 2. Comparisons of deviance information criterion for three ARX-GARCH mod-


els to select an optimal number of breakpoints.

Market Exogenous variable No. of breaks DIC Breakpoint locations


U.S. Interest rate k=0 −502.113
k=1 −507.313 T1 =214
k=2 −556.403 T1 = 186, T2 = 253
Inflation rate k=0 −501.938
k=1 −545.689 T1 = 115
k=2 −568.747 T1 = 114, T2 = 242
U.K. Interest rate k=0 −502.133
k=1 −506.516 T1 = 180
k=2 −559.334 T1 = 186, T2 = 253
Inflation rate k=0 173.231
k=1 158.682 T1 = 200
k=2 131.736 T1 = 67, T2 = 163
A bold number indicates the model with the smallest DIC value in each case.
Tj is the location of breakpoints (j = 1, 2).

The results above go against the viewpoint about the virtually uncontested
importance of interest rates toward the housing market. However, this is not
unrealistic since several studies struggle to achieve credible results concerning the
impact of interest rates (McQuinn and O’Reilly [27]). A possible reason for this
unusualness is that interest rates cannot increase or reduce the overall demand
for housing in the short run, but they just move it around. All countries including
the U.S. have to provide a huge amount of additional houses to their increasing
population every year. Faced with higher interest rates, people may delay plans
to buy a new house, and shift their demand from buying to renting. The higher
demand for renting will bring about more investors to invest in this market, and
as a result housing prices do not change quickly. The other explanation for the
ineffectiveness of interest rates is that they tend to rise when the economy is
growing, and in this situation people can afford more and are still willing to buy
a house, which does not change the demand for both houses and house prices.
Another reason from Sutton, Mihaljek, and Subelyte [36] is that house prices
adjust to interest rate changes gradually over time, but not immediately. That
could be why the one-lag exogenous variable used in our model does not figure
out the effect of interest rates on house prices in the U.S.
94 H. Than-Thi et al.

Fig. 4. Time plots of the U.S. housing returns with detected breakpoints in accordance
with a segmented ARX-GARCH model. The exogenous variable is interest rate (upper
panel) and inflation rate (lower panel). The two estimated breakdates are indicated by
dashed (red) vertical lines.

When using the inflation rate as an exogenous variable in examining housing


returns, two detected breakdates are different in the two countries (June 2000
and February 2011 for the U.S. and July 1996 and July 2004 for the U.K.). The
U.S. case still keeps the same characteristics in that regime II covers the full
global financial crisis, while regimes I and III cover the more stable time. Regime
(2) (2)
II also displays a high level of persistence (α1 +β1 ) across markets when we fit
a segmented ARX-GARCH model with inflation rate as the exogenous variable
for the U.S. market. In contrast, for the U.K. case, regime II captures the more
stable period while regimes I and III capture the volatile time of the market.
The different regimes’ characteristics in the U.K. when using the inflation rate
Bayesian Modelling Structural Changes on Housing Price Dynamics 95

Fig. 5. Time plots of the U.K. housing returns with detected breakpoints in accordance
with a segmented ARX-GARCH model. The exogenous variable is interest rate (upper
panel) and inflation rate (lower panel). The two estimated breakdates are indicated by
dashed (red) vertical lines.

compared to other cases are precise since regime I captures the sudden high
inflation rate during the early 1990s in this country and regime III captures the
global financial crisis.
(i)
The estimations of ψ1 , i = 1, 2, and 3, in the U.K. case suggest an irrelevant
connection between inflation rate and house prices in this country. Conversely,
(2) (1)
the estimation of ψ1 in the U.S. is negatively significant, while those of ψ1 and
(3)
ψ1 are not, revealing the fact that the inflation rate only has converse effects
on house prices in a more unstable period. This phenomenon leads to the main
implication that the inflation rate, which is often associated with higher prices,
96 H. Than-Thi et al.

Table 3. Bayesian estimation results of Segmented ARX-GARCH model for the U.S.
monthly housing returns.

Regime Parameter Mean Median S.E. P 2.5 P 97.5


Interest rate
(1)
I φ0 0.1723 0.1726 0.0287 0.1174 0.2303
(1)
φ1 0.5975 0.5967 0.0617 0.4809 0.7229
(1)
ψ1 −0.0563 −0.0558 0.0813 −0.2135 0.1097
(1)
α0 0.0258 0.0255 0.0068 0.0150 0.0402
(1)
α1 0.0690 0.0653 0.0287 0.0207 0.1404
(1)
β1 0.2441 0.2381 0.1249 0.0284 0.4976
(2)
II φ0 −0.1991 −0.1965 0.0744 −0.3503 −0.0632
(2)
φ1 0.4348 0.4339 0.1331 0.1759 0.6997
(2)
ψ1 −0.2471 −0.2432 0.2525 −0.7576 0.2463
(2)
α0 0.1118 0.1115 0.0510 0.0190 0.2110
(2)
α1 0.1916 0.1594 0.1389 0.0178 0.5582
(2)
β1 0.3888 0.3635 0.2063 0.0494 0.8386
(3)
III φ0 0.4911 0.4892 0.0643 0.3690 0.6229
(3)
φ1 0.0165 0.0193 0.1204 −0.2350 0.2484
(3)
ψ1 0.3254 0.3221 0.4400 −0.5697 1.1797
(3)
α0 0.0260 0.0260 0.0090 0.0090 0.0442
(3)
α1 0.1368 0.1023 0.1181 0.0024 0.4427
(3)
β1 0.2564 0.2231 0.1805 0.0133 0.6878
T1 186 186 2.295 181 190
(June 2006)
T2 253 252 1.322 250 257
(January 2012)
Inflation rate
(1)
I φ0 0.1803 0.1791 0.0445 0.0976 0.2696
(1)
φ1 0.3926 0.3952 0.1016 0.1924 0.5934
(1)
ψ1 0.0111 0.0091 0.1370 −0.2539 0.2829
(1)
α0 0.0259 0.0257 0.0042 0.0179 0.0343
(1)
α1 0.2453 0.2432 0.0690 0.1156 0.3857
(1)
β1 0.1999 0.2007 0.0578 0.0822 0.3119
(2)
II φ0 0.1525 0.1514 0.0364 0.0829 0.2261
(2)
φ1 0.8303 0.8317 0.0564 0.7170 0.9368
(2)
ψ1 −0.2293 −0.2299 0.0775 −0.3804 −0.0763
(2)
α0 0.0006 0.0006 0.0004 0.0000 0.0015
(2)
α1 0.0302 0.0302 0.0026 0.0251 0.0353
(2)
β1 0.8958 0.8959 0.0049 0.8859 0.9052
(3)
III φ0 0.4557 0.4556 0.0615 0.3401 0.5768
(3)
φ1 0.0707 0.0685 0.1218 −0.1678 0.3061
(3)
ψ1 0.0092 0.0108 0.1186 −0.2222 0.2479
(3)
α0 0.0077 0.0076 0.0016 0.0045 0.0109
(3)
α1 0.1162 0.1149 0.0393 0.0419 0.1985
(3)
β1 0.7307 0.7315 0.0325 0.6684 0.7942
T1 114 113 2.1095 110 118
(June 2000)
T2 242 242 1.3340 236 242
(February 2011)
Tj is the location of breakpoints (j = 1, 2).
Bayesian Modelling Structural Changes on Housing Price Dynamics 97

Table 4. Bayesian estimation results of Segmented ARX-GARCH model for the U.K.
monthly housing returns.

Regime Parameter Mean Median S.E. P 2.5 P 97.5


Interest rate
(1)
I φ0 0.1825 0.1814 0.0287 0.1283 0.2396
(1)
φ1 0.5791 0.5804 0.0606 0.4615 0.6936
(1)
ψ1 0.0540 0.0558 0.0699 −0.0806 0.1849
(1)
α0 0.0283 0.0278 0.0068 0.0163 0.0430
(1)
α1 0.0692 0.0661 0.0316 0.0205 0.1409
(1)
β1 0.1941 0.1809 0.1186 0.0165 0.4319
(2)
II φ0 −0.1882 −0.1848 0.0700 −0.3349 −0.0592
(2)
φ1 0.4432 0.4442 0.1295 0.1803 0.6986
(2)
ψ1 −0.5238 −0.5211 0.2563 −1.0370 −0.0312
(2)
α0 0.1133 0.1104 0.0502 0.0189 0.2111
(2)
α1 0.1821 0.1531 0.1257 0.0114 0.5090
(2)
β1 0.3546 0.3336 0.2040 0.0284 0.8046
(3)
III φ0 0.4964 0.5000 0.0646 0.3765 0.6190
(3)
φ1 0.0132 0.0105 0.1208 −0.2109 0.2585
(3)
ψ1 −0.4124 −0.4153 0.5481 −1.5044 0.6554
(3)
α0 0.0283 0.0275 0.0083 0.0135 0.0484
(3)
α1 0.1226 0.0998 0.0944 0.0043 0.3389
(3)
β1 0.2133 0.1953 0.1363 0.0103 0.5014
T1 186 185 2.205 181 190
(June 2006)
T2 253 252 1.282 249 256.5
(January 2012)
Inflation rate
(1)
I φ0 0.0060 0.0035 0.1398 −0.2668 0.2750
(1)
φ1 −0.1891 −0.1900 0.1540 −0.4855 0.1082
(1)
ψ1 −0.0347 −0.0386 0.2067 −0.4548 0.3729
(1)
α0 0.3507 0.3720 0.0903 0.1492 0.4633
(1)
α1 0.2216 0.1912 0.1524 0.0124 0.5712
(1)
β1 0.4300 0.4317 0.1752 0.0925 0.7567
(2)
II φ0 0.9147 0.9165 0.1665 0.5898 1.2382
(2)
φ1 0.1616 0.1605 0.1383 −0.1048 0.4412
(2)
ψ1 −0.0269 −0.0245 0.2148 −0.4466 0.3831
(2)
α0 0.3305 0.3423 0.0864 0.1380 0.4570
(2)
α1 0.2100 0.1797 0.1411 0.0194 0.5689
(2)
β1 0.2537 0.2188 0.1756 0.0188 0.6546
(3)
III φ0 0.2698 0.2686 0.0599 0.1546 0.3879
(3)
φ1 0.2789 0.2767 0.0937 0.0945 0.4524
(3)
ψ1 −0.1717 −0.1761 0.1576 −0.4732 0.1398
(3)
α0 0.0470 0.0389 0.0344 0.0039 0.1271
(3)
α1 0.2373 0.2213 0.0999 0.0891 0.4752
(3)
β1 0.6488 0.6693 0.1426 0.3427 0.8736
T1 67 66.0000 2.1966 64.0000 72.5000
(July 1996)
T2 163 162.0000 5.0665 157.0000 169.0000
(July 2004)
Tj is the location of breakpoints (j = 1, 2).
98 H. Than-Thi et al.

Fig. 6. The ACF plots of MCMC estimates for parameters θ are from a segmented
ARX-GARCH model for the U.S. housing returns. The exogenous variable: inflation
rate.

makes them retract during a recession. This goes against the conclusions in many
previous studies that house prices exhibit a stable inflation hedge (Anari and
Kolari [2], Abelson et al. [1]), but actually this is rather explainable. Remember
that the inflation rate is calculated using CPI, which does not include house
prices. During a stagnant economy, the inflation rate or higher prices mean that
households have to spend more on consumption, and their budget will be lower
Bayesian Modelling Structural Changes on Housing Price Dynamics 99

Fig. 7. The trace plots of MCMC estimates for parameters θ are from a segmented
ARX-GARCH model for the U.S. housing returns. The exogenous variable: inflation
rate.

in response. For this reason, the demand for houses will decrease, leading to a
drop in prices. Therefore, the negative causation of inflation rate to house prices
in a weak economic period is reasonable. Previous studies also noted this reverse
relationship (see Tsatsaronis and Zhu [38]).
To verify the estimation of our model, we provide the ACF plots and trace
plots of each estimated coefficient of the segmented ARX-GARCH model. Trace
100 H. Than-Thi et al.

Fig. 8. The volatility estimates of a segmented ARX-GARCH model for the U.S. hous-
ing returns. The exogenous variable is the first differenced interest rate (upper panel)
and inflation rate (lower panel). The two estimated breakdates are indicated by dashed
(red) vertical lines.

plots provide an important tool for assessing the mixing of a chain. Due to
limited space, we provide the results in the case of the U.S. inflation rate as
one example in Figs. 6 and 7. The other results are available upon request. In
summary, all ACF plots die down quickly, indicating convergence of the chains.
Visual inspection of trace plots, all MCMC samples seem to mix well.
Figure 8 presents the volatility estimates for the U.S. housing returns with
detected breakpoints in accordance with the segmented ARX-GARCH model.
Even when the exogenous variables are different, both panels capture the struc-
tural breaks in volatility very well since the most volatile periods fall entirely in
regime II, while regimes I and III cover the stable periods.
Bayesian Modelling Structural Changes on Housing Price Dynamics 101

Fig. 9. The volatility estimates of a segmented ARX-GARCH model for the U.K.
housing returns. The exogenous variable is the first differenced interest rate (upper
panel) and inflation rate (lower panel). The two estimated breakdates are indicated by
dashed (red) vertical lines.

Figure 9 demonstrates the volatility estimates for the U.K. housing market.
Once again, the breakpoints detected by our model capture the structure changes
of volatility very accurately. In the interest rates case, the highest volatile period
falls completely in regime II, while other regimes catch the constant time of
the market. When we use the inflation rate as the exogenous variable, even
the detected breakdates are different from the previous case due to the sudden
fluctuation of the U.K. inflation rate during the early 1990s; those breakpoints
capture the whole stable period in regime II, while the fluctuating periods are in
regimes I and III. The results once more time prove the precision of our method.
102 H. Than-Thi et al.

6 Conclusion
This research studies the impacts of interest rates and inflation rate on hous-
ing prices in the U.S. and U.K. by employing the Bayesian structural changes
approach. The breakpoint detection process using DIC shows a preference for
the two-breakpoint models (k = 2) compared to the no/one breakpoint model.
This findings prove the correctness of using the structural change model on the
housing price market.
The estimated results reveal two contrasting situations in two countries. The
interest rates cause an insignificant effect on house prices in the U.S., while only
having negative effects on house prices in an economic downturn in the U.K.
There are several reasons for this: interest rates do not change overall demand
for housing, but rather just move it from buying demand to renting demand;
higher interest rates could be a result of a better economy in which people have
greater ability to afford a new house; or interest rate changes influence house
prices gradually over time, but not immediately, and thus one lag does not
describe this relationship. When considering the inflation rate case, there is no
relation between it and house prices in the U.K., while it only causes a negative
impact on house prices in the U.S. during a volatile time of the economy. One
likely reason is that a higher inflation rate leads to a lower budget for households,
and hence demand and housing price also turn lower in response. However, the
opposite outcomes in the two countries are consistent in one aspect: house prices
in both countries are more sensitive in regress time.
This paper suggests the future research can look at the importance of exam-
ining structural changes in the housing market. Ignoring of this influential char-
acteristic may lead to unclear conclusions in housing price performance.

Acknowledgments. Cathy W.S. Chen’s research is funded by the Ministry of Science


and Technology, Taiwan (MOST 107-2118-M-035-005-MY2).

References
1. Abelson, P., Joyeux, R., Milunovich, G., Chung, D.: Explaining house prices in
Australia: 1970–2003. Econ. Rec. 81, 96–103 (2005)
2. Anari, A., Kolari, J.: House prices and inflation. Real Estate Econ. 30, 67–84
(2002)
3. Andrew, M., Meen, G.: House price appreciation, transactions and structural
change in the British housing market: a macroeconomic perspective. Real Estate
Econ. 31, 99–116 (2003)
4. Andrews, D.W.K.: Tests for parameter instability and structural change with
unknown changepoint. Econometrica 61, 821–856 (1993)
5. Aoki, K., Proudman, J., Vlieghe, G.: House prices, consumption, and monetary
policy: a financial accelerator approach. J. Financ. Intermediation 13, 414–435
(2004)
6. Bai, J., Perron, P.: Estimating and testing linear models with multiple structural
changes. Econometrica 66, 47–78 (1998)
Bayesian Modelling Structural Changes on Housing Price Dynamics 103

7. Bajari, P., Benkard, L., Krainer, J.: House prices and consumer welfare. J. Urban
Econ. 58, 474–487 (2005)
8. Belsky, E., Joel, P.: Housing wealth effects: housing’s impact on wealth accumula-
tion, wealth distribution and consumer spending. National Center for Real Estate
Research Report (2004)
9. Bollerslev, T.: Generalized autoregressive conditional heteroskedasticity. J. Econo-
metrics 31, 307–327 (1986)
10. Bollerslev, T., Chou, R.Y., Kroner, K.F.: ARCH modeling in finance: a review of
the theory and empirical evidence. J. Econometrics 52, 5–59 (1992)
11. Brown, J.P., Song, H., McGillivray, A.: Forecasting UK house prices: a time vary-
ing coefficient approach. Econ. Model. 14, 529–548 (1997)
12. Buiter, W.H.: Housing wealth isn’t wealth, Working Paper, London School of
Economics and Political Science (2008)
13. Case, K.E., Quigley, J.M., Shiller, R.J.: Wealth effects revisted: 1975–2012, Work-
ing Paper (2013)
14. Chen, C.W.S., Gerlach, R., Lin, A.M.H.: Falling and explosive, dormant and rising
markets via multiple-regime financial time series models. Appl. Stochast. Models
Bus. Ind. 26, 28–49 (2010)
15. Chen, C.W.S., Gerlach, R., Lin, E.M.H.: Volatility forecast using threshold het-
eroskedastic models of the intra-day range. Comput. Stat. Data Anal. 52, 2990–
3010 (2008)
16. Chen, C.W.S., Gerlach, R., Liu, F.C.: Detection of structural breaks in a time-
varying heteroscedastic regression model. J. Stat. Plann. Infer. 141, 3367–3381
(2011)
17. Chen, C.W.S., So, M.K.P.: On a threshold heteroscedastic model. Int. J. Forecast.
22, 73–89 (2006)
18. Cloyne, J., Huber, K., Ilzetzki, E., Kleven, H.: The Effect of House Prices on
Household Borrowing: A New Approach, Working Paper (2017)
19. Dong, M.C., Chen, C.W.S., Lee, S., Sriboonchitta, S.: How strong is the relation-
ship among Gold and USD exchange rates? analytics based on structural change
models. Comput. Econ. (2017). https://doi.org/10.1007/s10614-017-9743-z
20. Dougherty, A., Order, R.V.: Inflation, housing costs, and the consumer price index.
Am. Econ. Rev. 72, 154–164 (1982)
21. Elliott, G., Muller, U.: Optimally testing general breaking processes in linear time
series models, Working Paper, Department of Economics, University of California,
San Diego (2003)
22. Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of the
variance of United Kingdom inflation. Econometrica 50, 987–1008 (1982)
23. Englund, P., Ioannides, Y.M.: House price dynamics: an international empirical
perspective. J. Hous. Econ. 6, 119–136 (1997)
24. Gerlach, R., Chen, C.W.S., Chan, N.C.Y.: Bayesian time-varying quantile fore-
casting for value-at-risk in financial markets. J. Bus. Econ. Stat. 29, 481–492
(2011)
25. Hansen, B.: Testing for structural change in conditional models. J. Econometrics
97, 93–115 (2000)
26. Longstaff, F.A.: Borrower credit and the valuation of mortgage-backed securities.
Real Estate Econ. 33, 619–661 (2005)
27. McQuinn, K., O’Reilly, G.: Assessing the role of income and interest rates in
determining house prices. Econ. Model. 25, 377–390 (2008)
28. Mian, A., Rao, K., Sufi, A.: Housing balance sheets, consumption, and the eco-
nomic slump. Q. J. Econ. 128 (2013)
104 H. Than-Thi et al.

29. Mian, A., Sufi, A.: House Price Gains and U.S. Household Spending from 2002 to
2006, Working Paper (2014)
30. Miles, W.: Volatility clustering in U.S. home prices. J. Real Estate Res. 30, 73–90
(2008)
31. Miller, N., Peng, L., Sklarz, M.: House prices and economic growth. J. Real Estate
Finance Econ. 42, 522–541 (2009)
32. Pain, N., Westaway, P.: Modelling structural change in the UK housing market: a
comparison of alternative house price models. Econ. Model. 14, 587–610 (1997)
33. Pesaran, M.H., Timmermann, A.: How costly is it to ignore breaks when forecast-
ing the direction of a time series? Int. J. Forecast. 20, 411–425 (2004)
34. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Vander Linde, A.: Bayesian measures
of model complexity and fit. J. Roy. Stat. Soc. B 64, 583–640 (2002)
35. Stock, J.H., Watson, M.W.: Evidence on structural instability in macroeconomic
time series relations. J. Bus. Econ. Stat. 14, 11–30 (1996)
36. Sutton, G.D., Mihaljek, D., Subelyte, A.: Interest rates and house prices in the
United States and around the world, BIS Working Paper No. 665 (2017)
37. Truong, B.C., Chen, C.W.S., So, M.K.P.: Model selection of a switching mech-
anism for financial time series. Appl. Stochast. Models Bus. Ind. 32, 836–851
(2016)
38. Tsatsaronis, K., Zhu, H.: What drives housing price dynamics: cross-country evi-
dence. BIS Q. Rev. (2014)
39. Van-Nieuwerburgh, S., Weill, P.O.: Why has house price dispersion gone up? Rev.
Econ. Stud. 77, 1567–1606 (2010)
40. Yao, Y.C.: Estimating the number of changepoints via Schwarz criterion. Stat.
Probab. Lett. 6, 181–189 (1988)
41. Zhu, M.: Opening Remarks at the Bundesbank/German Research Founda-
tion/IMF Conference (2014)
Cumulative Residual Entropy-Based
Goodness of Fit Test for Location-Scale
Time Series Model

Sangyeol Lee(B)

Department of Statistics, Seoul National University, Seoul, South Korea


sylee@stats.snu.ac.kr

Abstract. This study considers the cumulative residual entropy (CRE)-


based goodness of fit (GOF) test for location-scale time series models.
The CRE-based GOF test for iid samples is introduced and the asymp-
totic behavior of the CRE-based GOF test and its bootstrap version is
investigated for location-scale time series models. In particular, the influ-
ence of change points on the GOF test is studied through Monte Carlo
simulations.

Keywords: Goodness of fit test · CRE-based test · Change point

1 Introduction

The GOF test has been playing a central role in matching given data sets with
the best fitted probabilistic models and has been applied to diverse applications
in economics, finance, engineering, and medicine. We refer to D’Agostino and
Stephens [2] for a general review of GOF tests. The testing method based on the
empirical process has been popular because it can generate several famous GOF
tests such as Kolmogorov-Smirnov, Cramér-von Mises, and Anserson-Darling
tests. For the asymptotic properties of the empirical process, see Durbin [3] for
iid samples and Lee and Wei [14] and Lee and Taniguchi [12] for autoregressive
and GARCH models. On the other hand, some authors considered the empirical
characteristic function-based GOF test, see Lee et al. [11] and the papers cited
therein.
Lee et al. [13] considered an entropy-based test, and Lee [7] and Lee et al. [8]
later showed that the entropy test performs well in time series models such as
GARCH models. Lee and Kim [9] and Kim and Lee [6] extended the entropy-
based GOF test to location-scale time series models and developed its bootstrap
test. Lee et al. [10] recently showed that a GOF test based on the CRE (Rao
et al. [19]) or the cumulative Kullback-Leibler divergence (Baratpour and Rad
[1]) compares well with or outperforms existing tests in various situations. They
particularly demonstrated the superiority to the entropy-based GOF test of Lee
et al. [13] in iid samples. Here we consider the CRE-based entropy test for

c Springer Nature Switzerland AG 2019


V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 105–115, 2019.
https://doi.org/10.1007/978-3-030-04263-9_7
106 S. Lee

location-scale time series models and its bootstrap version, and seek for their
asymptotic properties.
This paper also investigates the performance of the GOF test when the
parameter experiences a change. It is well known that financial time series often
experience structural changes due to critical events and monetary policy changes
and ignoring them can lead to a false conclusion. The change point test has a long
history and there are a vast amount of literatures. For recent references, we refer
to Lee and Lee [15] and Oh and Lee [18], who study the CUSUM test for GARCH-
type models and general nonlinear integer-valued autoregressive models, and
the papers cited therein. As seen in other inferences, the presence of parameter
changes can undermine GOF tests and mislead practitioners to a false conclu-
sion. For example, the normality test can be rejected owing to parameter changes
as seen in our simulations. However, if the distribution family for the GOF test
is broad or flexible enough, the impact of parameter changes on the GOF test
could be weakened to certain extent. Our simulation results show that the rejec-
tion of GOF tests can be attributed to the presence of parameter changes while
the acceptance of GOF test does not necessarily provide a complete evidence for
no change points. In particular, the latter phenomenon can become more promi-
nent as the underlying model gets more complicated, for example, from iid to
time series models. In fact, it is well known that piecewise stationary generalized
autoregressive conditionally heteroscedastic (GARCH) processes induced from
parameter changes can be easily misidentified as integrated GARCH (IGARCH)
processes, see Maekawa et al. [16].
This paper is organized as follows. Section 2 introduces the CRE-based GOF
test for iid samples and reviews its asymptotic behavior based on Lee et al.
[10]. Section 3 extends the results in Sect. 2 to location-scale time series models.
Section 4 carries out a simulation study to check the influence of parameter
changes on the GOF test. Section 5 provides concluding remarks.

2 Cumulative Residual Entropy Test


In this section we review the CRE-based GOF test in Lee et al. [10]. For any
density function f , the entropy-based GOF test is constructed based on the
Boltzmann-Shannon entropy defined by
 ∞
H(f ) = − f (x) log(f (x))dx (1)
−∞

(Jayens [4]). Lee et al. [13] developed a GOF test using an approximation form
of the integral in (1), and Lee et al. [10] recently considered a GOF test using the
CRE wherein the entropy is defined based on cumulative residual distributions.
That is, for any distribution function F with bounded support [0,1], we consider
the modification of (1) as follows:
 1  
1 − F (x)
IH(F ) = − (1 − F (x)) log dx,
0 1−x
CRE-Based Goodness of Fit Test 107

which becomes a cumulative Kullback-Leibler divergence (Barappour and Rad


[1]).
Putting
m  
ΨF (si ) − ΨF (si−1 )
ISm (F ) = (ΨF (si ) − ΨF (si−1 )) log , (2)
i=1
Ψ0 (si ) − Ψ0 (si−1 )
1 1
where ΨF (s) = s (1 − F (x))dx and Ψ0 (s) = s (1 − x)dx = 1 − s − (1 − s2 )/2,
m is the number of disjoint intervals for partitioning the interval [0, 1], and
0 < s0 < s1 < · · · < sm = 1 are preassigned partition points, provided that

f = F satisfies 0 < inf x f (x) ≤ supx f (x) < ∞ and max1≤i≤m |si − si−1 | → 0
as m → ∞, one can see that as m → ∞,
 1  
1 − F (x)
ISm (F ) −→ − (1 − F (x)) log dx = IH(F ),
0 1−x
see the proof in the Appendix of Lee et al. [10].
To construct a test statistic, we further consider a generalized version of (2)
by imposing some weights:
m  
w ΨF (si ) − ΨF (si−1 )
ISm (F ) = wi (ΨF (si ) − ΨF (si−1 )) log , (3)
i=1
Ψ0 (si ) − Ψ0 (si−1 )
m
where the w is a vector of weights with 0 ≤ wi ≤ 1 and i=1 wi = 1. Observe
w
that if F is the uniform distribution on [0, 1], ISm (F ) = 0.
Suppose that one wishes to test whether Xi , i = 1, . . . , n is a random sample
from an unknown cumulative distribution function F . For this task, we set up
the following hypotheses:
H 0 : F = F0 vs. H1 : F = F0 .
Note that F0 (Xi ) follow U [0, 1] under the null, and whence the testing problem
is reduced to a uniformity test. In view of (3), as a GOF test, we consider
m  
w Ψn (si ) − Ψn (si−1 )
ISm (Fn ) = wi (Ψn (si ) − Ψn (si−1 )) log , (4)
i=1
Ψ0 (si ) − Ψ0 (si−1 )
n
where
 Ψn = ΨFn with Fn (x) = n1 i=1 I(F0 (Xi ) ≤ x), so that Ψn (s) =
n w
i=1 (F0 (Xi ) − s)I(F0 (Xi ) > s). Then, we reject H0 if supw |ISm (Fn )| is
1
n
large. The test based on (4) can be implemented using the asymptotic result as
follows (Theorem 2.1. of Lee et al. [10]): under H0 , as n → ∞,
 
√ m 
w d  ◦ ◦ 
Tn := n sup |ISm (Fn )| → sup  wi (IB (si ) − IB (si−1 )) , (5)
w∈W w∈W  i=1

1
where IB ◦ (s) = − s B ◦ (x)dx and B ◦ is a Brownian bridge on [0, 1], W denotes
m with a finite number of elements of the space of weights 0 ≤ wi ≤ 1
any subset
with i=1 wi = 1, and 0 = s0 < s1 < · · · < sm = 1.
108 S. Lee

1
Since −IH(F ) − ( 0 xdF (x) − 1/2) = 0 if and only if F is a uniform distri-
bution on [0, 1] (cf. Baratpour and Rad [1]),
n Lee et al. [10] also suggested using
1
the test: T̃n := Tn + Δn with Δn = √nm | i=1 (F0 (Xi ) − 1/2)|. Their simulation
study reveals that T̃n outperforms T̂n in some examples.
Below, we consider the problem of testing the null and alternative hypotheses:

H0 : F ∈ {Fθ : θ ∈ Θd } vs. H1 : not H0 ,

where {Fθ } is a family of distributions. Given X1 , . . . , Xn , we check whether the


transformed random variables Ûi = Fθ̂n (Xi ) follow a uniform distribution on
[0, 1], say, U [0, 1], where θ̂n is an estimate of true parameter θ0 . We impose the
following conditions:
(A1) Fθ has a positive density fθ .
(A2) x → ∂F∂θ θ (x)
is uniformly continuous on (−∞, ∞), and supθ∈N
∂ 2 Fθ (x)  θ (x)
supx ∂θ∂θT ≤ L for some L > 0 and supθ∈N ∂F∂θ fθ (x)dx < ∞
for some compact neighborhood N of θ0 .
(A3) Under the null,
n
√ 1 
n(θ̂n − θ0 ) = √ l(Xi ; θ0 ) + op (1),
n i=1

where l(x; θ) is measurable with l(x; θ)fθ (x)dx = 0 and satisfies

sup ||l(x; θ)||2+δ fθ (x)dx < ∞
θ∈N

for some δ > 0.


w
As a test
statistic, we can use a version similar to (5) based on ISm (F̂n ) with
n
F̂n (s) = n i=1 I(Ûi ≤ s), namely,
1

 n 
√ w 1  

T̂n = n sup |ISm (F̂n )| and T̃n = T̂n + √  (Fθ̂n (Xi ) − 1/2) . (6)
w∈W nm  i=1


In implementation, we generate iid wij , j = 1, · · · , J, from U [0, 1], where J


wij
is a large integer, say, 1,000, and then, use w̃ij = w1j +···+w mj
and si = i/m, i =
1, · · · , m to apply the test:
 ⎛

⎞


i
i − 1 Ψ̂n mi − Ψ̂n i−1 
√ m m 
T̂n = n max  w̃ij Ψ̂n − Ψ̂n log ⎝

⎠(7)
1≤j≤J 
i=1
m m Ψ i − Ψ i−1 
0 m 0 m

with Ψ̂n = ΨF̂n .


Further, we use m = [n1/3 ] and adopt the bootstrap method:
(i) From the data X1 , . . . , Xn , obtain the MLE θ̂n .
CRE-Based Goodness of Fit Test 109

(ii) Generate X1∗ , . . . , Xn∗ from Fθ̂n (·) to obtain T̂n , denoted by T̂n∗ , with the
preassigned m in (7) based on these random variables.
(iii) Repeat the above procedure B times and calculate the 100(1 − α)% per-
centile of the obtained B number of T̂n∗ values for given 0 < α < 1.
(iv) Reject H0 if the value of T̂n obtained from the original observations is larger
than the obtained 100(1 − α)% percentile in (iii).

If θ̂n∗ , obtained from the bootstrap sample in (ii) above, satisfies


n
√ ∗ 1 
n(θ̂n − θ̂n ) = √ l(Xi∗ ; θ̂n ) + o∗p (1), (8)
n i=1

we have the following (cf. Theorem 2.4 of Lee et al. [10]).


√ n
Proposition 1. Let T̂n∗ := n supw∈W |ISm w
(F̂n∗ )| with F̂n∗ (x) = n1 i=1 I
n
(Fθ̂∗ (Xi∗ ) ≤ x), and T̃n∗ = T̂n∗ + √nm
1
| i=1 (Fθ̂∗ (Xi∗ )−1/2)|. Then, under (A1)–
n n
(A3) and (8) and under H0 , for all −∞ < x < ∞,

|P ∗ (T̂n∗ ≤ x) − P (T̂n ≤ x)| → 0 in probability,


|P ∗ (T̃n∗ ≤ x) − P (T̃n ≤ x)| → 0 in probability.

The result of Proposition 1 is extended to location-scale time series models in


Sect. 3. In our empirical study in Sect. 4, we use the bootstrap version of the test
in (7):

 ⎛     ⎞
   i   i − 1  Ψ̂n∗ m
i
− Ψ̂n∗ i−1 
√ m m 
T̂n∗ = n max  ∗
w̃ij Ψ̂n ∗
− Ψ̂n log ⎝     ⎠ (9)
1≤j≤J 
i=1
m m Ψ0 m i
− Ψ0 i−1 
m

with Ψ̂n∗ = ΨF̂ ∗ .


n

3 Location-Scale Time Series Model


In this section we extend the CRE-based entropy test to location-scale models:

yt = gt (β1,0 ) + ht (β0 )ηt , t ∈ Z, (10)

where g : R∞ × Θ1 → R and h : R∞ × Θm → R+ are measurable func-


tions, Θm = Θ1 × Θ2 with compact subsets Θ1 ⊂ Rd1 and Θ2 ⊂ Rd2 ,
gt (β1,0 ) = g(yt−1 , yt−2 , . . . ; β1,0 ) and ht (β0 ) = h(yt−1 , yt−2 , . . . ; β0 ), where
T T T
β0 = (β1,0 , β2,0 ) denotes the true model parameter belonging to Θm , and {ηt }
is a sequence of iid random variables with mean zero and unit variance. Further,
{yt : t ∈ Z} is assumed to be strictly stationary and ergodic and ηt is indepen-
dent of past observations Ωs for s < t. Model (10) includes various GARCH type
models. Recently, Kim and Lee [6] verified the weak consistency of the bootstrap
entropy test based on the residuals calculated from Model (10).
110 S. Lee

In this section, we focus on the CRE-based GOF test for Model (10). To this
end, we set up the following hypotheses:

H0 : Fη ∈ {Fϑ : ϑ ∈ Θd } vs. H1 : not H0 , (11)

where Fη denotes the innovation distribution of the model and Fϑ can be any

family of distributions. In what follows, we assume that fϑ = Fϑ exists and is
positive and fϑ is continuous, which also ensures the continuity of Fϑ−1 in ϑ, due
to Scheffe’s theorem.
To implement a test, we check whether the transformed random variables

y − g (β )
t t 1,0
Ut = Fϑ0
ht (β0 )

follow a uniform distribution on [0, 1], where ϑ0 and β0 are the true parameters.
Since the parameters are unknown, we check the departure from U [0, 1] based on
yt −g̃t (β̂1,n )
Ût := Fϑ̂n (η̂t ) with η̂t = h̃t (β̂n )
, where g̃t (β1 ) = g(yt , yt−1 , . . . , y1 , 0, . . . ; β1 )
and h̃t (β) = h(yt , yt−1 , . . . , y1 , 0, . . . ; β) with β = (β1T , β2T )T ∈ Θm .
Lee and Kim [9] studied the asymptotic behavior of the residual empirical
process:

V̂n (r) = n(F̂nR (r) − r), 0 ≤ r ≤ 1
n
with F̂nR (r) = n1 t=1 I(Fϑ̂n (η̂t ) ≤ r), where ϑ̂n is any consistent estimator of ϑ0
under the null, for example, the MLE. More precisely, Lee and Kim [9] showed
that under certain regularity conditions,

V̂n (r) = Vn (r) + Rn (r) + op (1), (12)


√ n
uniformly in r, where Vn (r) = n(Fn (r) − r) with Fn (r) = 1
n t=1 I(Fϑ0 (ηt ) ≤
r) and

√ 1 ∂g1 (β1,0 ) 

Rn (r) = n(β̂1n − β1,0 )T E fϑ0 (Fϑ−1
0
(r))
h1 (β0 ) ∂β1
√  1 ∂h (β ) 
+ n(β̂n − β0 )T E Fϑ−1 (r)fϑ0 (Fϑ−1
1 0
0 0
(r))
h1 (β0 ) ∂β
√ ∂Fϑ0 (Fϑ−1 (r))
+ n(ϑ̂n − ϑ0 )T 0
.
∂θ
Based on this fact, they found the limiting null distribution of the residual
entropy test and its bootstrap version.
The residual CRE test can be designed similarly to (6), that is,
 
√ n 
1  
T̂nR = n sup |ISm w
(F̂nR )| and T̃nR = T̂nR + √  (Fθ̂Rn (η̂t )) − 1/2) .(13)
w∈W nm  
t=1
CRE-Based Goodness of Fit Test 111

In implementation, using w̃ij in (7). we employ the test, similar to (7), as follows:
T̂nR (14)
 ⎛

⎞


i
i − 1 Ψ̂nR mi − Ψ̂nR i−1 
√ m m 
= n max  w̃ij Ψ̂nR − Ψ̂nR log ⎝

⎠ ,
1≤j≤J 
i=1
m m Ψ0 mi − Ψ0 i−1 
m

where Ψ̂nR is the same as Ψ̂n with F̂n replaced by F̂nR .


Further, we use the parametric bootstrap method below to obtain the critical
values:
(i) Based on the data y1 , . . . , yn , obtain a consistent estimator θ̂n .
(ii) Generate η1∗ , . . . , ηn∗ from Fϑ̂n (·) and obtain y1∗ , . . . , yn∗ through Eq. (10)
with β0 replaced by its MLE β̂n . That is, yt∗ = g̃t (β̂1,n ) + h̃t (β̂n )ηt∗ . Then,
calculate T̂n∗ with a preassigned m based on these random variables.
(iii) Repeat the above procedure B times and calculate the 100(1 − α)% per-
centile of the obtained B number of T̂nR∗ values.
(iv) Reject H0 if the value of T̂nR in (14) obtained from the original observations
is larger than the obtained 100(1 − α)% percentile in (iii).
Below we discuss the weak convergence of the above bootstrap test. Let
y ∗ −g̃t (β̂ ∗ )
Ut∗= Fϑ̂∗ (η̂n∗ ) with residuals η̂t∗ = t h̃ (β̂ ∗1,n , and define the bootstrap residual
n t n)
empirical process:

V̂n∗ (r) = n(F̂nR∗ (r) − r), 0 ≤ r ≤ 1,
n
with F̂nR∗ (r) = n1 t=1 I(Fϑ̂∗ (η̂t∗ ) ≤ r).
n
Theorem 3.2 of Kim and Lee [6] shows that under some regularity conditions,
V̂n∗ (r) = Vn∗ (r) + Rn∗ (r) + o∗p (1), (15)
in probability,
√ uniformly in r, which is a  bootstrap version of (12), wherein
n
Vn∗ (r) = n(FnR∗ (r) − r) with FnR∗ (r) = n1 t=1 I(Fϑ̂n (ηt∗ ) ≤ r) and
√  1 ∂g (β ) 
t 1,0
Rn∗ (r) = n(β̂1,n

− β̂1,n )T E fϑ̂n (Fϑ̂−1 (r))
ht (β0 ) ∂β1T n

√  1 ∂ht (β0 ) −1


+ n(β̂n∗ − β̂n )T E Fϑ̂ (r)fϑ̂n (Fϑ̂−1 (r))
ht (β0 ) ∂β T n n

√ ∂Fϑ̂n (Fϑ̂−1 (r))


+ n(ϑ̂∗n n
. T
− ϑ̂n )
∂ϑ
In our study, we use the bootstrap version of the test in (13), similar to (9),
as follows:
 n 
√  
1  
T̂nR∗ = n sup w
|ISm (F̂nR∗ )| and T̃nR∗ = T̂nR∗ +√ R∗ ∗
 (Fθ̂n (η̂t )) − 1/2) . (16)
w∈W nm  
t=1

Based on (15), we can have the following.


112 S. Lee

Theorem 1. Let T̂nR and and T̃nR be the ones in (13), and T̂nR∗ and T̃nR∗ be the
ones in (16). Then under (A2), (A3), (B0)–(B4) in Kim and Lee [6], and H0
in (11), we obtain
m
m

 
T̂nR = sup  wi Vn (si ) − Vn (si−1 ) + wi Rn (si ) − Rn (si−1 ) 
w∈W i=1 i=1
+op (1), (17)
m
m

 
T̂nR∗ = sup  wi Vn∗ (si ) − Vn∗ (si−1 ) + wi Rn∗ (si ) − Rn∗ (si−1 ) 
w∈W i=1 i=1

+op (1), (18)
1 1 1
where Vn (s) = s Vn (r)dr, Rn (s) = s Rn (r)dr, Vn∗ (s) = s Vn∗ (r)dr, and
1
Rn∗ (s) = s Rn∗ (r)dr. Hence, we find that for all −∞ < x < ∞,



 ∗ 
P T̂nR∗ ≤ x − P T̂nR ≤ x  → 0 in probability, (19)



 ∗ 
P T̃nR∗ ≤ x − P T̃nR ≤ x  → 0 in probability. (20)

Proof. Under the null, we have Ψ̂nR (s) → Ψ0 (s) in probability as n → ∞, owing
|x| ≤ 1/2, we can
 R | log(1R + x) − x| ≤ x for
2
to (12). Then, by using

the fact that
 Ψ̂n (si )−Ψ̂n (si−1 ) 
write that on An := max1≤i≤m  Ψ0 (si )−Ψ0 (si−1 ) − 1 ≤ 1/2 ,
  
√  m
Ψ̂nR (si ) − Ψ̂nR (si−1 ) 
 
T̂nR = sup  n R R
wi (Ψ̂n (si ) − Ψ̂n (si−1 )) · log −1+1 
w  Ψ (s
0 i ) − Ψ (s
0 i−1 ) 
i=1
m   
 Ψ̂nR (si ) − Ψ̂nR (si−1 )

 
= sup  wi · Vˆn (si ) − Vˆn (si−1 )  + δn
w  Ψ 0 (si ) − Ψ 0 (si−1 ) 
 i=1 
 m 
 
= sup  wi (Vn (si ) + Rn (si ) − Vn (si ) − Vn (si−1 )) + op (1),
w  
i=1
1
where we have used the fact that Vˆn (s) := s
V̂n (r)dr = Vn (s) + Rn (s) + op (1)
owing to (12) and



2
√ Ψ̂nR (si ) − Ψ̂nR (si−1 ) R R
|δn | ≤ n max −1 max |Ψ̂n (si ) − Ψ̂n (si−1 )| = op (1).
1≤i≤m Ψ0 (si ) − Ψ0 (si−1 ) 1≤i≤m

Then, since P (An ) → 1 as n → ∞, (17) is asserted by the continuous mapping


theorem. Similarly, using (15), we can show that (18) holds (cf. the proof of
Theorem 2.3 of [10]). Then, in view of (17), (18), and (B4) in Kim and Lee [6],
we can see that (19) is true, and so is (20), which validates the theorem. 
In implementation, we use the bootstrap version of the test in (14), similar
to (9), as follows:
CRE-Based Goodness of Fit Test 113

Table 1. Empirical size and powers for the iid normal samples at the level of 0.05.

Size 0.050
Power
μ = 1 0.053
μ = 2 0.087
μ = 3 0.577
μ = 4 0.970
μ = 5 0.957
Power
σ = 2 0.187
σ = 3 0.587
σ = 4 0.827
σ = 5 0.943

Table 2. Empirical sizes for the GARCH(1,1) model with N (0, 1) innovations at the
level of 0.05.

(ω, α, β) Size
(0.1, 0.3, 0.6) 0.140
(0.1, 0.2, 0.7) 0.087
(0.1, 0.1, 0.8) 0.063

T̂nR∗ (21)
 ⎛     ⎞
   i   i − 1  Ψ̂nR∗ m
i
− Ψ̂nR∗ i−1 
√ m m 
= n max  R∗
w̃ij Ψ̂n R∗
− Ψ̂n log ⎝     ⎠ ,
1≤j≤J 
i=1
m m i
Ψ0 m − Ψ0 m i−1 
 n 
 
1  
T̃nR∗ = T̂nR∗ + √ ∗
 (Fθ̂n∗ (η̂t ) − 1/2)
nm  i=1


with Ψ̂nR∗ = ΨF̂ R∗ .


n

4 Simulation Study
Since the CRE-based GOF test is proved to outperform several existing GOF
tests in iid samples and works well for GARCH models as seen in the real data
example of Lee et al. [10], our simulation study focuses on examining the influ-
ence of the change points on GOF tests. For this task, we employ T̂nR∗ in (21)
with J = 1000, m = 5, and apply it to iid normal and GARCH(1,1) samples
with normal innovations. In iid case, the sample is assumed to follow a N (0, 1)
114 S. Lee

Table 3. Empirical powers for the GARCH(1,1) with N (0, 1) innovations at the level
of 0.05.

(ω0 , α0 , β0 ) → (ω1 , α1 , β1 ) Power


(0.1, 0.1, 0.2) → (0.1, 0.2, 0.7) 0.397
(0.1, 0.1, 0.2) → (0.1, 0.3, 0.6) 0.387
(0.1, 0.1, 0.2) → (0.1, 0.4, 0.5) 0.377
(0.1, 0.1, 0.7) → (0.1, 0.1, 0.8) 0.105
(0.1, 0.2, 0.5) → (0.1, 0.2, 0.7) 0.183
(0.1, 0.2, 0.5) → (0.1, 0.3, 0.6) 0.240
(0.1, 0.3, 0.3) → (0.1, 0.2, 0.7) 0.277
(0.1, 0.3, 0.3) → (0.1, 0.1, 0.8) 0.243

distribution under the null and to have a change from N (0, 1) to N (μ, σ 2 ) dis-
tribution at n/2 = 50 under the alternative. Here, we consider the two cases (i)
only μ changes and σ is fixed; (ii) only σ changes and μ is fixed. In GARCH
case, the GARCH model under the null is yt = σt t with σt2 = ω + αyt−1 2 2
+ βσt−1
and {t } ∼ N (0, 1). Under the alternative, (ω, α, β) is assumed to change from
(ω0 , α0 , β0 ) to (ω1 , α1 , β1 ) at t = 50. To save computational time, we use n = 100
with the number of bootstraps and repetitions = 300. Sizes and powers are cal-
culated at the level of 0.05 for different parameter settings.
Table 1 shows that the normality test for iid case has no size distortion and
produces better powers as the difference of parameter change gets larger, indicat-
ing that the parameter change can severely damage the GOF test. On the other
hand, Tables 2 and 3 show that the normality test for GARCH(1,1) innovations
has some size distortions (maybe because α + β is close to 1) and produces less
powers than that for the iid samples, which reveals a possibility that the influ-
ence of parameter changes can reduce to certain extent due to model complexity.
Our findings show that the change point test should be carefully performed in
advance of conducting GOF tests.

5 Concluding Remarks
In this paper, we studied the CRE-based GOF test for location-scale time series
models and its bootstrap version. We also carried out Monte Carlo simulations to
see the influence of parameter changes on the GOF test. The result reveals that
parameter changes can much affect GOF tests and a change point test should
be carried out a priori before GOF tests. In particular, parameter changes in
GARCH models appeared to affect the GOF test to a less degree than those in iid
samples. This might be a reason that the GARCH model often passes well model
check tests even in the presence of change points. However, more experiments
are needed until a firm conclusion is reached because here we only considered
the GARCH model with normal innovations and the CRE-based entropy test.
CRE-Based Goodness of Fit Test 115

As such, we plan to extend this work to other sophisticated time series models
such as smooth transition GARCH models with non-normal innovations, see
Khemiri [5] and Meiz and Saikkonen [17], and other GOF tests including the
Anderson-Daring test with more extensive empirical studies.

Acknowledgements. This research is supported by Basic Science Research Program


through the National Research Foundation of Korea (NRF) funded by the Ministry of
Science, ICT and future Planning (No. 2018R1A2A2A05019433).

References
1. Baratpour, S., Rad, H.: Testing goodenss-of-fit for exponential distribution based
on cumulative residual entropy. Commun. Stat. Theory Methods 41, 1387–1396
(2012)
2. D’Agostino, R.B., Stephens, M.A.: Goodness-of-Fit Techniques. Marcel Dekker,
Inc., New York (1986)
3. Durbin, J.: Weak convergence of the sample distribution function when parameters
are estimated. Ann. Stat. 1, 279–290 (1973)
4. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–
630 (1957)
5. Khemiri, R.: The smooth transition GARCH model: application to international
stock indexces. Appl. Financial Econ. 21, 555–562 (2011)
6. Kim, M., Lee, S.: Bootstrap entropy test for general location-scale time series
models with heteroscedasticity. J. Stat. Comput. Simul. 13, 2573–2588 (2018)
7. Lee, S.: A maximum entropy type test of fit: composite hypothesis case. Comp.ut
Stat. Data Anal. 57, 59–67 (2013)
8. Lee, J., Lee, S., Park, S.: Maximum entropy test for GARCH models. Stat.
Methodol. 22, 8–16 (2015)
9. Lee, S., Kim, M.: On entropy test for conditionally heteroscedastic location-scale
time series models. Entropy 19(8), 388 (2017)
10. Lee, S., Park, S., Kim, B.: On entropy-type goodness of fit test based on integrated
distribution. J. Stat. Comput. Simul. 88, 2447–2461 (2018)
11. Lee, S., Maintanis, S., Cho, M.: Inferential procedures based on the integrated
empirical characteristic function. AStA Adv. Stat. Anal., 1–30 (2018)
12. Lee, S., Taniguchi, M.: Asymptotic theory for ARCH models: LAN and residual
empirical process. Statistica Sinica 15, 215–234 (2005)
13. Lee, S., Vonta, I., Karagrigoriou, A.: A maximum entropy type test of fit. Comput.
Stat. Data. Anal. 55, 2635–2643 (2011)
14. Lee, S., Wei, C.Z.: On residual empirical process of stochastic regression models
with applications to time series. Ann. Stat. 27, 237–261 (1999)
15. Lee, Y., Lee, S.: On CUSUM tests for general nonlinear inter-valued GARCH
models. Ann. Inst. Stat. Math., Online published (2018)
16. Maekawa, K., Lee, S., Tokutsu, Y., Park, S.: Cusum test for parameter changes in
GARCH(1,1) models with applications to Tokyo stock data. Far East J. Stat. 18,
15–23 (2006)
17. Meitz, M., Saikkonen, P.: Paramter estimation in nonlinear AR-GARCH models.
Econometric Theory 27, 1236–1278 (2011)
18. Oh, H., Lee, S.: Modified residual CUSUM test for location-scale time series models
with heteroscedasticity. Ann. Inst. Stat. Math., Online published (2018)
19. Rao, M., Chen, Y., Vemuri, B.C., Wang, F.: Cumulative residual entropy: a new
measure of information. IEEE Trans. Inf. Theory 50, 1220–1228 (2004)
The Quantum Formalism in Social
Science: A Brief Excursion

Emmanuel Haven1,2(B)
1
Memorial University, St. John’s, Canada
ehaven@mun.ca
2
IQSCS, Leicester, UK

Abstract. This contribution discusses several examples on how social


science problems can begin to be re-interpreted with the aid of elements
of the formalism of quantum mechanics.

1 Introduction
It is surely not new to involve formalisms from other disciplines in social science.
The most obvious example is of course the use of mathematics in areas such as
economics and psychology. Post war economics as led by luminaries like Kenneth
Arrow and Gérard Debreu was responsible for defining a plethora of economics
concepts with the aid of mathematics. Do we have an equivalent ‘School’ which
contributed towards the rigorous definition of economics concepts with the aid of
physics? The answer shall be ‘no’. Although econophysics1 published an impor-
tant array of papers, its oeuvre did not really enter mainstream economics as
was the case with the Arrow-Debreu school which mathematized economics.
This paper will, on prima facie, go into a slightly different direction. We will be
concerned to lay out in a brief fashion, some of the applications of the formalism of
quantum mechanics in social science. For those readers who are completely new to
this area, we can from the outset, already mention that, although there is no main-
stream area of economics2 which will cater towards those very specific applications,
there is (now) surely a mainstream component in mathematical psychology which
uses those formalisms. The interested reader should peruse the books by Khren-
nikov [1]; Nguyen [2] and Busemeyer [3] to get a much better idea.
This paper is meant to give a very brief overview of some of the applica-
tions without wanting to pretend to be in any way exhaustive. We will discuss in
the sequel some applications involving the area of finance, more specifically with
regards to:

– arbitrage/non arbitrage
– value versus price and pricing rules
– decision making
1
Econophysics is a movement which has endeavoured to apply statistical mechanics
concepts mostly to finance but also to economics.
2
And even less in finance!.
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 116–123, 2019.
https://doi.org/10.1007/978-3-030-04263-9_8
The Quantum Formalism in Social Science: A Brief Excursion 117

We finish the paper with a discussion on memory in finance models, and we


shall again introduce some of the quantum formalism in that discussion too.

2 What is NOT Implied with the Use of the Quantum


Formalism?

As is the case with much of interdisciplinary work, knowing the true limits of such
work is incredibly difficult. What do I mean? If one uses mathematics to further
one’s rigorous understanding of economics or finance, then such true limits do not
really occur. Why? Mathematics is a universal language which can be applied
to any cognitive discipline. One can not say the same about physics. Physics
studies events which pertain to nature. No work in physics was ever conceived for
applications to the social sciences. This argument, is almost perfectly intuitive
in the case of quantum physics, which studies nature at an incredibly small
scale. To come back to the argument that mathematics is a universal language
applicable to any domain of knowledge, it is precisely this argument we can use
to argue why we can borrow elements of the quantum mechanics formalism and
apply them in macroscopic environments like social science. In other words, the
mathematical apparatus of quantum mechanics is used in those applications.
Hence, no implications can be formulated which would make statements like “so
you are in fact saying that the financial world is quantum-mechanical?3 ”

3 Very Basic Elements of the Quantum Formalism

This paper can and will not be a repository where the basics of the quantum for-
malism are explained. We refer back to Nguyen [2] and Haven and Khrennikov [4]
for the basic ideas. See also Haven, Khrennikov and Robinson [5]. Essentially, one
needs to realize that a big difference between classical mechanics and quantum
mechanics is that in the latter one uses (finite or infinite) dimensional Hilbert
space. Thus, a vector space is used. An essential difference between quantum
mechanics and classical mechanics is the distinction one MUST make between
measurements and states. Position is an example of a measurement and typically
we say that an ‘observable’ is measurable (i.e. they are represented by operators
which are Hermitian4 ). In classical mechanics the state and measurement are
the same thing. They are definitely different in quantum mechanics. This is an
essential difference.

3
Those sort of arguments I hear often. They are expected but they also show that
when individuals make those statements, they problably will not have read much of
the mainstream literature on the interface of quantum mechanics and social science.
4
Think of an operator as an instruction. A Hermitian operator expressed in matrix
form will essentially say this: if you take the transpose of a matrix (and you multiply
each element with its complex conjugate), then if that yields the original matrix, the
matrix is Hermitian.
118 E. Haven

4 Some Applications
4.1 Arbitrage/non Arbitrage
Arbitrage is a key concept in finance and intuitively one could define an arbitrage
opportunity as a way to realize a risk free profit. The absence of such profits
is assumed in the derivation of academic finance models. A good example is
the Black-Scholes option pricing theory [6]. When reformulating option pricing
theory within a Hamiltonian5 framework, Baaquie [7] has shown that the Black-
Scholes Hamiltonian is not Hermitian. This non-Hermiticity of the Hamiltonian
is intimately linked to the arbitrage condition. In quantum mechanics Hamil-
tonians need to be Hermitian. In Haven, Khrennikov and Robinson [5] (p. 140
and following), we discuss how considering non Hermitian Hamiltonians may be
plausible in the context of so called open systems. In an open system there is
an interaction with an environment (i.e. one does not have an isolated system).
Khrennikova, Haven and Khrennikov [8] provide for arguments on how to use
such an open system within the context of political science. The environment is
seen there as a set of information.
We can also wish to actively involve the state function in the set up of the
so called non-arbitrage theorem in finance (see Haven and Khrennikov [9]). The
basic idea here is that a change in the state function could trigger arbitrage.
The state function is interpreted as an information wave function and is the
key input in the generation of a probability. This is a very simple application
of quantum mechanics in that the wave function (with its complex conjugate)
yields a probability. The wave function can be seen as a probability amplitude (or
probability wave). For those readers who are really new to elementary quantum
mechanics, the probability amplitude is the key device by which we can, in a
formal way, describe the well-known double slit experiment6 . This experiment
makes us to have to use the so called interference term in the basic quantum
probability formulation.

4.2 Value Versus Price and Pricing Rules


Another basic idea from quantum mechanics is the idea of superposition of states.
In a financial context, we could think of so called ‘value’ states as opposed to
price states. If we work with the elements of a vector space in quantum mechanics
then we shall employ so called ‘kets’. They are often denoted with a ‘>’ symbol.
The idea is then as follows: consider the price of an asset to be a superposition
of (say) four value states: |p >= b1 |v1 > +b2 |v2 > +b3 |v3 > +b4 |v4 >, where
5
The Hamiltonian is the sum of potential and kinetic energy. In a quantum mechan-
ical context, when the Hamiltonian becomes an operator, this forms a key part in
the rendering of the so called Schrödinger partial differential equation (PDE). This
PDE describes the undisturbed evolution of a state (in time dependent or time
independent fashion). It is a central equation in quantum mechanics.
6
Many textbooks exist which introduce quantum physics. A great book to consider
is Bowman [10].
The Quantum Formalism in Social Science: A Brief Excursion 119

2
|bi | =probability of each value to occur. We need to note that such a formulation
is not at all without criticism. As an example, we can query the validity of linear
independence of the value states.
We can go a little further and also consider so called pricing rules. Such rules
are steeped in a little more context. Essentially, the argument revolves around
the idea that Bohmian mechanics as an interpretation of quantum mechanics,
can serve a very targeted purpose within our environment of applications in
social science. Bohmian mechanics is a physics theory and was never developed
with social science applications in mind. The interested reader is referred to
Bohm [11,12] and Bohm and Hiley [13]. In Bohmian mechanics, a key concept
is the quantum potential (which is narrowly connected with a measure of infor-
mation). It can be claimed, that for a large part, the rationale for using the
quantum formalism in social science traces back to the need of wanting to have
an information formalism. The quantum potential depends on the amplitude of
a wave function. You may recall from your high school physics that the force
is the negative gradient of the potential. As an example, say that the price of
an asset is p. Assume there exists an amplitude function R(p). The quantum
potential Q(p)7 can be formulated and its force, −∂Q∂p , calculated. Why could this
be a pricing rule? One can easily come up with examples, where for instance if
p is small and p increases there is a negative force, which resists this price to
go up further. However, when p is large, when the price increases the resisting
decreases. Those two cases give an idea that there is some pricing rule hiding
behind those forces. See also Haven and Khrennikov [9] for explicit examples.

4.3 Decision Making


As we mentioned at the beginning of this paper, in mainstream mathematical
psychology, the quantum formalism has made inroads to such an extent that
nowadays it is considered as a mainstream contribution. There is a dynamic
literature on that very topic. Excellent sources to consider for much more infor-
mation on this topic are Khrennikov [1] and Busemeyer and Bruza [3]. In a
nutshell, research in this area started via the observation that in decision mak-
ing formalisms such as the various expected utility frameworks used in economics
and finance8 , there are deviations from the normative behavior (as prescribed
by the axiomatic frameworks of expected utility). One very well known paradox
is the so called Ellsberg paradox (Ellsberg [15]). In summary form, one considers
a so called two stage gamble. This is a fancy way of saying that one gambles in a
first period, call it period t and then subsequently, at period t > t one gambles
again. However, the decision to gamble at time t is conditional upon knowing
what the outcome of the gamble was at time t.
You are either informed that:
– (i) the first gamble was a win; or
7
No Planck constant occurs in the macroscopic version of this potential!.
8
If you are not sure what those expected utility frameworks are, a great book to
consider the intricacies is by Kreps [14].
120 E. Haven

– (ii) the first gamble was a loss; or


– (iii) there is no information on what the outcome was in the first gamble

There is an intuitive principle that is embedded in one of the axiomatic


structures of expected utility, which essentially says that there is no reason for
you not to prefer to gamble at time t , if you have no information about the
outcome of the gamble at time t IF you are preferring to gamble at t whether you
are informed you either lost or won at time t. This principle has been violated,
consistently, in many experiments. It was first noticed in work published by Shafir
and Tversky [16]. In Busemeyer and Wang [17] two approaches are juxtaposed: a
Markov approach and a quantum-like approach. Following the Markov approach,
frequencies which are obtained through real world experiments indicate that 59%
would gamble (if informed they lost) and 69% would gamble (if informed they
won). The case where no information is given at time t should give a frequency
which is the average of the above frequencies. However, in repeated experiments
this is not the case. The quantum approach remedies this situation by defining
the ‘no information’ state as a superposition of both the ‘informed-won’ state and
‘informed-lost’ state. We work here in Hilbert space. Recall, from the beginning
of the paper that quantum mechanics will require this. In the use of the basic
quantum probability rule, it is the interference term which can accommodate
the observed frequencies.

5 Time Asymmetries and Memory


In finance, we do work especially in derivative pricing theory, with random
motions which exhibit very little memory. One can explicitly embed a mem-
ory component in a random model, via the use of so called fractional Brownian
motion but this is not the object of our discussion in this paper. Another issue
is time. In the title of this section we wrote the words ‘time asymmetry’. What
is meant with this? We need to recall that classical mechanics prescribes that
a process is perfectly time reversible. We have alluded above about Bohmian
mechanics and the quantum potential when we discussed the pricing rule.
Nelson [18,19] has derived the quantum potential in an alternative way. In so
doing he defines the drift part of a Brownian motion (when the infinitesimal
change of position of a variable is being formalized) in two different ways: by
looking at future positions and by looking at past positions. In classical, Newto-
nian mechanics, the difference between those two drift rates is zero. However, in
the approach by Nelson, the difference is non-zero and so called osmotic velocity
then obtains. The non-zero difference also signifies that there is now asymme-
try between past and future time. It can also be remarked that such non-zero
term can be related to the existence of so called Fisher information. We need to
add that Fisher information can be used in modelling information in economics
(Hawkins and Frieden [20]). We do not expand on it further in this paper.
If we focus on formalizing information in economics and finance, within a
completely mainstream setting (i.e. without any connections to physics), then
The Quantum Formalism in Social Science: A Brief Excursion 121

we surely need to mention a model which provides for an excellent discussion


on how to model private information (as opposed to public information). Those
types of information play a key role in determining the level of efficiency in a
market. Intuitively, we want to think of an efficient market as a market which
becomes more efficient the less individuals can influence the said market. Public
information, is, as its name denotes, information which can be used by many,
whilst at the other end of the accessibility spectrum figures private information.
Detemple and Rindisbacher [21] define the concept of ‘Private information Price
of Risk (PIPR)’, as representing “the incremental price of risk assessed when pri-
vate information becomes available.” ([21], p. 190). They discuss how the infor-
mation gain, attached to the presence of private information, can be measured.
This important formalism is set within an environment which is memory-less
and there is no asymmetry between past and future time.
If we want to continue focussing on modelling information in social science,
then we could take a resolutely different route. What if we consider a formalism
which is not memory-less and does allow for asymmetry between past and future
time? I conclude this paper with an argument that such model can exist and I
will now expand a little on this.
The physics model upon which such proposed formalism could be based on,
derives from the so called ‘walking droplet’ model ([22]), which seems to feature
characteristics which can be found back in quantum mechanics. See Hardesty [24]
and Bush [25]. To quote work of Wind-Willassen, Harris and Bush [23]: “Drops
bouncing on a vibrating fluid bath have recently received considerable attention
for two principal reasons. ....in certain parameter regimes, the bouncers walk
horizontally through resonant interaction with their wave field. The resulting
walkers represent the first known example of a pilot-wave system.” (p. 082002-1).
The pilot-wave system, the authors mention refers to Bohmian mechanics. The
droplet for it to bounce needs a vibrating surface. Both the droplet and surface
are made out of the same liquid. Experiments show that there are guiding
waves which influence the droplet. The droplet’s motion is influenced by wave
superposition occurring from positions occupied by the droplet in past time.
Hence, there is an embedded memory property.
Although it is difficult to judge how close this is to Bohmian mechanics, the
interested reader is referred to Bush [25] and Fort et al. [26] for a discussion
on quantization arguments. On could make the argument that such a model
could hold serious promise as an explicit formalism to model information in
an economics/finance setting. But one needs to actively wonder if both (i) the
presence of memory and (ii) the asymmetry in time - are desirable properties in
a finance or economics environment. Here are some initial examples of analogies
we could make:
From physics: (i) The droplet bounces at time t at a height h (in the bath);
(ii) the orbital wave is generated upon impact; (iii) two coordinates: time on
X-axis and position (height, h) on Y-axis
122 E. Haven

by analogy to finance (i), (ii) and (iii) become: (i) the price level is generated
at time t; (ii) it has an information impact; (iii) price level is height (on Y-axis)
and it occurs at time t (on X-axis)
What is important is that this macroscopic model is parameter rich. A key
issue then becomes how, on the basis of hard data, we want to establish analogies
between this model and economics and finance.

References
1. Khrennikov, A.: Ubiquitous Quantum Structure: from Psychology to Finance.
Springer (2010)
2. Nguyen, H.T.: Quantum Probability for Behavioral Economics. Short Course at
BUH. New Mexico State University (2018)
3. Busemeyer, J. R., Bruza, P.: Quantum Models of Cognition and Decision.
Cambridge University Press, Cambridge (2012)
4. Haven, E., Khrennikov, A.Yu.: The Palgrave Handbook of Quantum Models in
Social Science, pp. 1–17. Springer - Palgrave MacMillan (2017)
5. Haven, E., Khrennikov, A., Robinson, T.: Quantum Methods in Social Science: A
First Course. World Scientific, Chapter 10 (2017)
6. Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit.
Econ. 81, 637–659 (1973)
7. Baaquie, B.: Quantum Finance. Cambridge University Press, Cambridge (2004)
8. Khrennikova, P., Haven, E., Khrennikov, A.: An application of the theory of open
quantum systems to model the dynamics of party governance in the US political
system. Int. J. Theoret. Phy. 53(4), 1346–1360 (2014)
9. Haven, E., Khrennikov, A.: Quantum Social Science. Cambridge University Press,
Cambridge (2013)
10. Bowman, G.: Essential Quantum Mechanics. Oxford University Press, Oxford
(2008)
11. Bohm, D.: A suggested interpretation of the quantum theory in terms of hidden
variables. Phys. Rev. 85, 166–179 (1952a)
12. Bohm, D.: A suggested interpretation of the quantum theory in terms of hidden
variables. Phys. Rev. 85, 180–193 (1952b)
13. Bohm, D., Hiley, B.: The Undivided Universe: An Ontological Interpretation of
Quantum Mechanics. Routledge and Kegan Paul, London (1993)
14. Kreps, D.: Notes on the Theory of Choice. Westview Press (1988)
15. Ellsberg, D.: Risk, ambiguity and Savage axioms. Q. J. Econ. 75, 643–669 (1961)
16. Shafir, E., Tversky, A.: Thinking through uncertainty: nonconsequential reasoning
and choice. Cognit. Psychol. 24, 449–474 (1992)
17. Busemeyer, J.R., Wang, Z.: Quantum information processing: explanation for inter-
actions between inferences and decisions. In: Quantum Interaction - AAAI Spring
Symposium (Stanford University), pp. 91–97 (2007)
18. Nelson, E.: Derivation of the Schrödinger equation from Newtonian mechanics.
Phys. Rev. 150, 1079–1085 (1966)
19. Nelson, E.: Stochastic mechanics of particles and fields. In: Atmanspacher, H.,
Haven, E., Kitto, K., Raine, D. (eds.) 7th International Conference on Quantum
Interaction QI 2013. Lecture Notes in Computer Science, vol. 8369, pp. 1–5 (2013)
The Quantum Formalism in Social Science: A Brief Excursion 123

20. Hawkins, R.J., Frieden, B. R.: Quantization in financial economics: an information-


theoretic approach. In: Haven, E., Khrennikov, A. (eds.) The Palgrave Handbook of
Quantum Models in Social Science: Applications and Grand Challenges. Palgrave-
Macmillan Publishers (2015)
21. Detemple, J., Rindisbacher, M.: The private information of risk. In: Haven, E.
et al. (eds.) The Palgrave Handbook of Post Crisis Financial Modelling. Palgrave-
MacMillan Publishers (2015)
22. American Institute of Physics: Walking droplets: strange behavior of bouncing
drops demonstrates pilot-wave dynamics in action. Science Daily, 1 October 2013
23. Wind-Willassen, O., Molácek, J., Harris, D.M., Bush, J.: Exotic states of bouncing
and walking droplets. Phys. Fluids 25, 082002 (2013)
24. Hardesty, L.: New math and quantum mechanics: fluid mechanics suggests alter-
natives to quantum orthodoxy. M.I.T. ScienceDaily, 12 September 2014
25. Bush, J.W.M.: Pilot wave hydrodynamics. Ann. Rev. Fluid Mech. 47, 269–292
(2015)
26. Fort, E., Eddi, A., Boudaoud, A., Moukhtar, J., Couder, Y.: Path-memory induced
quantization of classical orbits. Proc. Nat. Acad. Sci. USA 107(41), 17515–17520
(2010)
How Annualized Wavelet Trading
“Beats” the Market

Lanh Tran(B)

Department of Statistics, Indiana University, Bloomington, IN 47408, USA


LanhTran14@gmail.com

Abstract. The market refers to the S&P 500 stock index SPY, which
is an important benchmark of U.S. stock performances, and “beating”
the market means earning a return greater than the market. The pur-
pose of this paper is to showcase an annualized wavelet trading strategy
(WT) that outperforms the market at a fast rate. The strategy is con-
tained in the website AgateWavelet.com. No prediction of market prices
is involved and using the website does not require any skills on the part of
the trader. By trading the index SPY back and forth about 4 to 5 times
a week for a year, the wavelet trader WT has an expected rate of return
approximately 26% higher than the market. The Sharpe ratios are com-
puted and they show that WT also has a higher expected risk-adjusted
return than the market. The result is a surprise since SPY has long been
considered to be a stock to buy and hold. In addition, proponents of the
Efficient Market and Random Walk hypotheses claim that the market
is “unbeatable” because market prices are unpredictable. Thus WT also
provides a counterexample to this claim.

1 Introduction
The market in this paper represents the S&P 500, which is an important bench-
mark of U.S. stock performances, and “beating” the market means earning a
return greater than the market. Is there any trading strategy that can “beat”
the market? This question is of much interest to stock traders, academicians
and people with interest in business finance and economics. There is plenty of
empirical evidence and statistics against the existence of such a strategy; Barber
and Odean [3] show that traders who buy and sell frequently usually end up los-
ing more money than those who trade less often. In addition, proponents of the
random walk hypothesis (RWH) and efficient market hypothesis (EMH) assert
that stocks take an unpredictable path and hence it is impossible to outperform
the overall market consistently. There is a large body of literature on the EMH
and RMH. For a bibliography, see Fama [4–6], Fama and French [7], Malkiel [8],
Tran [12] and the references therein. Readers interested in predicting the market
are referred to a recent book by Yardani [13] for more information.
Recently, Tran [12] has shown that there are strategies that “beat” the market
consistently without involving price prediction. However, these strategies are
not quite useful in real-life trading due to the long waiting time to “beat” the
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 124–137, 2019.
https://doi.org/10.1007/978-3-030-04263-9_9
Annualized Wavelet Trading 125

market. The purpose of the current paper is to showcase a more effective wavelet
trading strategy (WT) that “beats” the market at a fast rate. The new strategy
is contained in the website AgateWavelet.com. Trading decisions made by the
website depend on movements of wavelets caused by price fluctuations.
Relevant data to the paper is displayed at
https://iu.box.com/s/spy64zjv0fx9fa11ndvt1zt419m5c2xj
which will be referred to as “the link”. The SPY historical data set displayed at
the link lists the dates and corresponding adjusted closing prices of SPY from
January 29th, 1993 to January 26th, 2018. The data was downloaded on January
27th, 2018 from the Yahoo Finance website online. A date of the year will often
be displayed in the same style of the SPY data set. For example, January 27th,
2018 is written as 1/27/2018 or 1/27/18.
The paper is organized as follows: Sect. 2 discusses stock market trading
in general and also presents the assumptions. Section 3 explains the website in
detail. The focus is on annualized trading which lasts exactly one year. WT is
programmed to buy 1,000 shares of SPY on the first trading and to liquidate
all her shares one year later.On each day of trading during a calendar year, the
trader enters the date and corresponding price of SPY and activate the website.
The computer program in the website processes the information, and then tells
WT whether to buy, sell or hold. The exact number of shares to trade is also
specified. An example using dates and prices of SPY for a year is provided.
Section 4 discusses Sharpe ratios which are used to show that WT has a higher
risk-adjusted return than the market. Section 5 shows that WT outperforms the
market using historical data.
Geometric Brownian motion is employed in the Black-Scholes model and is
a very popular model for stock market prices. In Sect. 6, a Geometric Brownian
motion (GBM) is fitted to the historical SPY data. Section 7 compares the per-
formance of WT versus the market using simulated data generated by the fitted
GBM. Again, WT outperforms the market with an expected return higher than
the market by about the same percentage found in Sect. 5. The computed Sharpe
ratio shows that WT has a Sharpe ratio higher than the market.
Section 8 provides some descriptive statistics using graphs to compare the
performances of WT and the market. Section 9 explains the idea behind WT’s
strategy and Sect. 10 discusses the results of the paper and some other relevant
issues. Some material from Tran [12] is included here for completeness. The
mathematical formulas used in the paper are presented at the link.
The numbers from Sects. 4–8 demonstrate clearly that:
Based on historical data and an unlimited amount of simulated data, after a
year of trading the SPY back and forth, the trader WT has an expected return
higher than the market by about 26%. In addition, WT has a higher risk-adjusted
return than the market.
A game-theoretic argument then easily shows that WT “beats” the market
consistently by repeatedly employing the year-long strategy at the website. This
126 L. Tran

is quite surprising since it has long been considered that SPY is a stock to buy
and hold and that it is impossible to “beat” the market.

2 Stock Trading

Margin. Buying on margin means borrowing money from the brokerage to


buy stocks. The collateral for the borrowed funds is the stocks and cash in the
investor’s account. A regular trader can easily get a 2:1 margin loan at any
brokerage with a margin account. Interactive Brokers (IB) is a large premier
brokerage and currently an owner of a special memorandum account at IB can
get a margin of 2.25:1. Suppose WT is allowed to hold a margin leverage of a:1
where a is 2 or higher, then for every dollar she has in her account, she can buy
up to a dollars in stocks.
Cash. If WT buys on margin and owes the brokerage money, then cash is a
negative number and she has to pay interest to the brokerage on the money that
she owes. Cash is positive if she has surplus money in her account.
Trading by WT is done through a brokerage which must satisfy the following
assumptions:

Assumption 1. The brokerage charges an interest rate no higher than 4% per


year and collects interest daily (compounded once a day). The trader gets 2%
interest for surplus cash in her account.

Assumption 2. The brokerage allows WT a margin leverage a:1 where 2 ≤ a ≤


2.25 and checks WT’s margin every day (usually around 4:00 PM) before closing
time.

Assumption 3. The brokerage charges a commission equal to .04% of the cost of


a transaction.

Currently the margin rate of interest at Interactive Brokers (IB) is around


3.2%. However, IB does not pay interest for surplus cash in the trader’s account.
WT’s cash is almost never positive, so the interest she gets does not affect her
return much. A trader with a lot of surplus cash can transfer her money to a
sweep account to collect some interest. The result of the paper would not change
much if WT gets no interest for surplus cash.
The commission charged by IB is $1.00 if the number of shares per trade is
200 or less; it is $0.005 per share if a trade involves more shares. The commission
charged by the brokerage in Assumption 3 is rather high.
Market Value. Since WT has only SPY stocks. The market value equals the
number of shares multiplied by the current SPY share price.
Account Balance. The account balance is the total value of WT’s account. It
is the net liquidation value of her account which equals the sum of market value
plus cash.
Annualized Wavelet Trading 127

Buying Power. This is the amount of money still available for WT to buy
stocks. Since her account balance is the sum of the market value and cash, her
buying power equals:

Buying Power = a(M V + Cash) − M V = (a − 1)M V + a × Cash.

It is important for WT to maintain a positive power at all time. She gets a


margin call if her buying power becomes negative.
Interest. Assume that interest is collected once a day.

3 The Website

Click on www.AgateWavelet.com and the screen shows:


Welcome to AgateWavelet.com
Run Annualized Wavelet Trading Strategy
Enter Data. Click on “Run Annualized Wavelet Trading Strategy”, and then
enter the margin ratio that the brokerage allows you. Click on “Enter Data” and
then type in the date and price of SPY. The date needs to have two digits for
the month, two digits for the day and four digits for the year.
Let us start with an example. Consider the set of data SPY-Histo-Data in
Exhibit 1 stored in the Folder Data at the link. This set of data contains a list
of dates and prices of SPY from 1/26/18 to 1/25/19. You can cut and paste this
data set on the screen. If you choose the excel.csv file, then you need to format
the dates column so the year is displayed with 4 digits. If you choose excel.xlsx,
then the years in the dates column are already written with 4 digits, so no
formatting is needed. You should download the files before doing any copying
and pasting.
Generate Data. Click on “Generate Data” to activate the website. A table
(Results Table) appears at the bottom of the page with information instructing
the trader as to sell, buy or hold. The outputs on the upper right hand corner
are the Excel files: result.csv, result-no-zeros.csv and result.xlsx. They can be
downloaded and opened in your computer. Result.csv is a text file which can
be opened with a text editor but no text formatting can be done with this file.
Result-no-zeros.csv is the same as result.csv with days of no trading deleted
and result.xlsx contains the same information as result.csv but looks nicer. The
output files are displayed in the folder Output in Exhibit 1 at the link.
Reset Inputs. You should press “Reset Inputs” to clear the memory of the
computer at the website before you enter new data on the screen.
The variables in result.csv are similar to the variables in the tables displayed
in Tran [12]. A full detailed explanation of these variables can be found there. I
now briefly describe them for completeness.
The first three columns list respectively the trading dates, share prices at
which trades are made and the numbers of shares traded.
128 L. Tran

The fourth column lists the commissions charged by the brokerage for each
trade.
The fifth column (CumCom) lists cumulative commissions, which are total
commissions paid up to the current trading day.
The sixth column (Cost) lists the cost of each trade which equals the number
of shares traded multiplied by the share price.
The seventh column (CumCost) lists the cumulative cost paid by the trader.
The eighth column (CumShares) lists the total number of shares held by the
trader after each trade.
The ninth column (MV) lists the market value of the trader’s stocks on each
trading day.
The tenth column lists the amount of cash in the trader’s account.
The eleventh column lists WT’s buying power.
The twelth column lists the interest WT pays or collects for the cash in her
account.
The thirteenth column lists the account balance, which is equal to the market
value of WT’s shares plus cash in her account.
The fourteenth column lists WT’s return, which equals her account balance
subtracted by the sum of her original investment and cumulative commissions.
The fifteenth column lists the return of the market, which is what WT’s return
would be had she bought 1,000 shares on 1/26/18 and held them without trading.
The formulas and computations to make result.csv are diplayed in the file
Formula.xlsx in Exhibit 1 at the link.

4 Sharpe Ratios
The ex-post Sharpe Ratio (see Sharpe [10,11]) is used for risk-adjusted returns.
The rates of returns of WT are bench-marked against the realized values of SPY.
The computation of SR is carried out for result.csv above and the details are
displayed in the files SR.csv and SR.xlsx contained in the folder Sharpe Ratio
at the link. The market has SR equal to zero while WT has SR equal to 0.12.
Thus WT has higher risk-adjusted returns than BH.

5 Performance of WT on Historical Data


The historical data set displayed in the folder Historical Data at the link contains
6,295 dates and prices of SPY starting on 1/29/93 and ending on 1/26/18. Let us
compare the performances of WT and the market using past historical data. The
folder HistoDataPerfom (HDP) at the link contains 6,043 files in the “inputs”
created as follows:
random-0000.csv lists the dates and prices of SPY from 1/29/93 to 1/28/94,
random-0001.csv lists the dates and prices of SPY from 2/1/93 to 1/31/94,
...
random-6042.csv lists the dates and prices of SPY from 1/26/2017 to
1/25/2018,
Annualized Wavelet Trading 129

The folder “outputs” contains trades made by WT using the files in the
“inputs” folder.
Upload File. This button is created to check quickly that the output files
correspond to the input files. Download “inputs” to your computer then use
the “Upload File” button to upload an input file. Click on “Generate Data” to
obtain output files.
The buying power in the output files are always positive, indicating that WT
never gets a margin call. The results from the output files are summarized in
stats.csv in the folder Stats. Some interesting statistics are:
WT and the market’s average rates of return are, respectively, equal to
13.92% and 10.79%. Thus WT earns on average a rate of return 29.03% higher
than the market. The average ex-post Sharpe ratio is about 0.046. The formulas
to compute the statistics are in the file stats.xlsx in HDP.

6 Fitting a GBM to Historical Data


The equation for a geometric Brownian motion (GBM) is given by:
 σ2 
St = S0 exp (μ − )t + σWt ,
2
where Wt is standard Brownian motion. Here, St is the value of GBM at time
t and S0 is the initial value. The parameters −∞ < μ < ∞ and σ > 0 are
constants. GBM serves as an important example of a stochastic process satisfying
a stochastic differential equation.
The mean and variance of St are:

E(St ) = S0 exp(μt),
 
V ar(St ) = (S0 )2 exp(2μt) exp(σ 2 t) − 1 .
Let X(t) = log St − log St−1 . Then

X(t) = μ − (σ 2 /2) + σ(Wt − Wt−1 ).

Hence X(t) is normally distributed with mean μ − (σ 2 /2) and standard deviation
σ. A total of 6,294 values of Xt is obtained from the spy-historical.data set at the
link. Using these values, estimates of the drift parameter μ − (σ 2 /2) and standard
deviation σ are, respectively, 0.00037294 and 0.011597345. These estimates are,
respectively, the sample mean and sample standard deviation of 6,294 observations
of Xt ’s. The fitting of the GBM is carried out in the file GBM-fit.xlsx at the link.

7 Simulation Using the Website


This section is devoted to compare by simulation the performances of WT and
the market for the period of one year. The historical data ends on 1/26/2018 so
130 L. Tran

let us choose this date to start the year. On this day the price of SPY is 286.58
dollars a share. The simulated data is generated using the equation
 
St = 286.58 exp 0.00037294t + 0.011597345Wt .

On 1/26/2018, the trading starts with S0 = 286.58. The next trading day
is 1/29/2018 since 1/27/2018 and 1/28/2018 are weekend days. The simulated
price of SPY on 1/29/2018 is S1 .
Random Data. Go to the webpage and click on “Run Annualized Wavelet
Trading Strategy” and then enter the margin that you hold. Click on “Random
Data” and then on “Generate Data”. The website generates an Excel.csv file
containing simulated values of SPY for one year. There are a total of 261 values
since weekend days are excluded. You can continue by clicking on “Generate
Table” to see how WT trades with this set of data.
If you play with “Random Data” long enough, you will see that WT is more
likely to “beat” the market in a “bull year” when prices increase, and the market
is more likely to “beat” WT in a “bear” year when prices decrease. Since there
are more “bull” years than “bear” years, WT “beats” the market in the long
run.
Multi File Simulator. The Multi-File Simulator page is created to replicate
“Random data” up to 1,000 times. Go to the web page and click on Multi-File
simulator then enter the margin ratio allowed by your brokerage and the number
of times you want to replicate. Let us set the number of times to be 1,000. Click
on “Run Simulation” then wait a couple of minutes. The website generates three
files: stats.csv, inputFiles.csv and outputFiles.csv.
inputFiles.csv contains 1,000 sets of simulated data for the trading year begin-
ning on 1/26/18 and ending on 1/25/19.
outputFiles.csv contains the trades of WT corresponding to the data in the
input files.
stats.csv contains relevant statistics obtained from an analysis of the the
output files.
Exhibit 2 at the link contains the results of 1,000 replications of “Random
Data” as an example. Denote the difference between the returns of WT and the
market by alpha. Denote the average return of WT and the market, respectively,
by WT Ave and Market Ave. The statistics from stats.csv are summarized below:
(1) The difference alpha is positive in 575 cases out of 1,000.
(2) WT Ave and Market Ave are, respectively, $42,561.23 and $33,784.8 for
the 1,000 replications. WT’s trading produces an average increase in return
of $8,776.43 or 25.98% for the year.
(3) WT’s expected rate of return is 100×(42, 561.23/286, 694.63)%, or 14.83%
and the market’s rate of return is 100 × (33, 784.8/286, 694.63)%, or 11.78%.
By trading occasionally, WT’s yearly rate of return is 25.89% higher than the
market.
(4) WT’s expected Sharpe ratio is taken as the average of the 1,000 Sharpe
ratios. It is approximately equal to 0.025.
Annualized Wavelet Trading 131

The “% Inc. Return” in stats.csv is the average percentage increase in return


due to trading. This number makes sense only when Market Ave is positive. If
you set the number of random files in the Multi-File Simulator to be 100 or
above, then Market Ave is positive with probability practically equal to 1.
The expected values reported in the introduction are computed from averages
of numerous runs by the Multi-File Simulator. Note that the buying powers of
WT in the output files in Exhibit 2 are always positive, indicating, that WT
never gets a margin call.
The use of Sharpe ratio rests on the assumption that returns are normally
distributed. Daily returns have a tendency to be heavy-tailed (see Nagahara [9],
for example).
However, yearly returns of WT and the market, being sums of 261 daily
returns, have distributions that are better approximated by normal distributions.
This follows since sums of random variables with finite means and variances tend
to normality by the central limit theorem. The graphs of the densities of WT’s
returns and market’s returns will be presented in the next section.
The calculation of Sharpe ratio using yearly returns is in file SR-
YearlyRet.xlsx in Exhibit 2. The Sharpe ratio is found to be 0.298.
The next section is devoted to provide some descriptive statistics regarding
WT’s returns and market’s returns.

8 Graphs
QQ Plot. The QQ plot is a graph of the quantiles of the trader’s returns against
the quantiles of the market’s returns. The QQ plot is displayed below.
The returns of WT and the market are sorted in increasing order and paired
together for a total of 1,000 pairs. The data set obtained is referred to as QQ-
Plot.csv displayed in the folder Quantile-QQ-Plot at the link.
The QQ-Plot.csv file can be used to compute the percentiles of the trader’s
and market returns. The 1st quartile, 2nd quartile (median) and 3rd quartile
of the returns of WT are, respectively, −$22,631.28, $34,388.36 and $94,222.46.
The 1st quartile, 2nd quartile (median) and 3rd quartile of the returns of the
market are, respectively, −$11,194.63, $28,455.37, and $69,945.37.

(i) Look at the straight line going through the points (0,0) and (200,000,
200,000). A point on the QQ plot below this line indicates that WT does
better than the market and any point above it indicates otherwise.
(ii) The point where the graph intersects the horizontal axis occurs when market
price stays put after one year. This point is slightly to the left of the origin,
indicating that the market performs better than WT if market price does
not increase by the end of the trading year.
(iii) The graph lies almost in a straight line, showing that the trader’s returns
and market’s returns have similar distributions. They are only different by
a change of scale and location.
132 L. Tran

(iv) In the upper right hand side and lower left side, the scattered points indicate
that WT’s return has a heavier tail than the market.
(v) The QQ plot indicates that WT tends to underperform the market in a
bear year and vice versa outperforms the market in a bull year. The gain
of WT over the market in a bull year is likely bigger than the gain of the
market over WT in a bear year. At any moment, WT is more likely to be
in bull than bear territory since SPY has an increasing trend. Hence WT
“beats” the market in the long run.

QQ plot of Market vs Trading







●●
200,000 ●●


●●●


●●
● ●●



● ●●●


●●

●●●


●●





●●●



●●




●●

●●



●●●




●●●
Market Return






●●●

100,000 ●
●●


●●
●●
●●●

●●




●●


●●

●●


●●

●●

●●

●●●



●●

●●

●●
●●


●●


●●

●●


●●


●●
●●






●●


●●



●●


●●






●●

●●


●●



●●


●●


●●


●●

●●

●●


●●




●●
●●



0 ●






●●


●●
●●
●●



●●


●●




●●




●●

●●

●●



●●

●●
●●

●●

●●
●●



●●

●●

●●




●●









●●



●●

●●
●●

●●

●●



●●

●●

−100,000 ●

0 200,000 400,000
Trading Return
Kernel Density Estimators of Trader’s Returns and Market Returns.
Below are graphs of kernel estimators of the probability density functions (pdf)
of WT’s and market’s returns. The graphs are continuous and look nicer than
histograms.
Annualized Wavelet Trading 133

Kernel Density Plot of Marketing vs Trading

6e−06

4e−06
variable
density

Trading
Market

2e−06

0e+00

0 200,000 400,000
value
Note the following:

(i) The pdf of WT’s returns clearly has heavier tails than the pdf of the mar-
ket’s returns and is more skewed toward the right hand side. The kernel
density plots clearly show that WT’s returns are greater than the market
in the upper tail, but lower than the market in the lower tail. There is
a region in the middle where positive returns can be achieved with high
probability using the market strategy.
(ii) The probability that market’s returns exceed $270,000 is practically zero,
whereas, the probabilty of this event is about .01 for WT’s returns. The
probability that market’s losses exceed $100,000.00 is practically zero,
whereas, the probability of this event is about 0.0275 for WT’s return.
(iii) The mean and standard deviation of the market’s returns are, respectively,
$33,784, and $59,193, whereas, the mean and standard deviation of the
trader’s returns are, respectively, $42,561, and $87,411.
134 L. Tran

Kernel Density Estimator of Alpha. Below is the kernel density estimator


for the pdf of the alpha. Recall that alpha is the difference between the returns
of WT and the market.

Kernel Density Plot of alpha

1e−05
density

5e−06

0e+00

−50,000 0 50,000 100,000 150,000


value

(i) Note that alpha is a negative number if WT’s return is less than the market’s
return. The probability that alpha is negative is .425.
(ii) The distribution of alpha is skewed to the right. The mean and standard
deviation of alpha are, respectively, $8,776 and $29,412. The median of
alpha is $5,616, which is much smaller than the mean. This happens due to
the long tail of the distribution of alpha.
(iii) Suppose the distribution of alpha has mean μ and variance σ 2 . Consider
the problem of testing H0 : μ = 0 against HA : μ > 0. Let Ā and s denote,
respectively, the sample mean
√ and sample standard deviation of alpha. The
usual test rejects H0 if Ā 1000/s is large. This statistic is approximately
normally distributed with mean zero and variance 1 by the central limit
theorem.
√ A simple computation shows that the sample statistic equals to
8776 1000/29412 or 9.44 which is quite large. The right decision is to reject
the null hypothesis H0 since the p-value of the test is practically zero.
Annualized Wavelet Trading 135

(iv) The 95% confidence interval for alpha is 8776 ± 1.96 × 29412/ 1000 or
[6953, 10599]. The 95% confidence interval for the “% Inc. Return” is [100 ×
6953/33784, 100 × 10599/33784] or [20.58, 31.37].

9 Theory Behind WT’s Strategy


WT starts trading with a buy of 1,000 shares. The only money she ever invests
is the amount that she pays for the 1,000 shares and the commission to buy
these shares. Using margin leverage, she is allowed to buy additional shares.
The number of additional shares she can buy varies with her buying power. At
any point of time, her shares are more likely to increase in price than decrease
since SPY has an increasing trend. To outperform the market, she trades back
and forth while trying to hold on to a minimum of 1,000 shares. Her strategy
is to increase her cumulative number of shares gradually while avoiding margin
calls. She borrows on margin to pay all expenses: money to buy extra shares,
interests on the amount borrowed and trading commissions. The sum of the
interests and extra commissions she pays during the year is likely less than her
gain due to increases in prices of her additional shares. This happens since SPY
tends to infinity at a sufficiently fast rate. By the end of the year, she has a
higher expected rate of return than the market. Her Sharpe ratio is also higher
than the market.
Avoiding margin calls is the hardest part of the strategy. Note that WT
always maintains a positive buying power. She buys at price dips and attempts
to sell at higher prices. But occasionally, she has to sell at a loss. This is done
to avoid margin calls when price drops dramatically. However, since SPY tends
to increase, she sells high and buys low more often than buys high and sells low.
The important variables are interest rates, commissions, account balances
and buying power. The numbers of shares traded also vary with the depths
of price dips, heights of price peaks, among other variables. They look rather
bizarre due to the unpredictable movements of market prices.
How does WT “beat” the market consistently? Think of her trading as a
series of games with each lasting for a year. She has a higher expected return
than the market in each game. By repeatedly playing the games, she will “beat”
the market in the long run with probability one. The method of the current
paper is game-theoretic and is much simpler to employ than the complicated
asymptotic approach used in [12].

10 Discussion
1. WT can start with an amount of shares, not necessarily equal to 1,000 at the
beginning of a trading year. Suppose she starts with 2,000 shares. Then she
can buy twice as many shares as the number recommended by the website.
2. How many shares should WT begin with at the beginning of a trading year?
There is no mathematical answer to this question. She should just start with
whatever she can afford.
136 L. Tran

3. SPY is considered to be a low risk stock to trade and a hedge fund manager
may be able to get a margin leverage up to 10:1 (see Ang et al. [1] and
Baker and Filbeck [2]).
4. Under the Capital Asset Pricing Model (CAPM), an investor is assumed
to be able to borrow as much as she wants at the risk-free rate. A margin
leverage of 10:1 may not then be unreasonable under CAPM. A trader
can possibly increase her expected return with a higher leverage. However
borrowing too much can decrease the trader’s risk adjusted return.
5. A trader using WT should check the price of SPY daily. She is required
to buy, sell or hold according to the guidance of WT. However, her yearly
expected return would not change much if she skips a day occasionally.
6. What if WT enters the date and price of SPY several times a day on the
screen? This will probably increase WT’s return slightly by the end of the
year. No definite answer on this is available at this time.
7. Buying or selling at prices approximately equal to the prices recommended
by WT would not affect the trader’s overall return much.
8. In Sect. 4, WT earns on average a rate of return about 29.03% higher than
the market. The price of SPY is not assumed to behave according to any
model.
9. The geometric Brownian motion is used in the Black-Scholes model. It is
used in the Multi File Simulator to generate an unlimited amount of SPY
data. The fitting of a GBM to the SPY historical data works well for the
purpose of comparing WT and the market. Using simulated data, WT out-
performs the market with a yearly rate of return about 25.89% higher than
the market, which is not too far below the 29.03% found for using real data.
The 26% percent reported in the abstract and introduction appears to be
a conservative estimate.
10. The general belief is that it is impossible to “beat” a GBM since technical
analysis is useless in predicting future prices. Simulation using the Multi
File Simulator clearly shows that predictability of future prices is not a
necessary condition for a trader to consistently “beat” the market. This is
quite surprising.

Exercise. Assume that WT has a 2:25 margin leverage from her brokerage. Use
the Multi-File Simulator at the website to show that WT’s expected annual
return is approximately 30% higher than the market. Find also the ex-post
Sharpe ratio of WT and show that WT has a higher risk-adjusted return than
the market.

References
1. Ang, A., Gorovyy, S., van Inwegen, G.B.: Hedge fund leverage. J. Financ. Econ.
102, 102–126 (2011)
2. Baker, H.K., Filbeck, G.: Hedge Funds: Structure, Strategies, and Performance.
Oxford University Press, Oxford (2017)
Annualized Wavelet Trading 137

3. Barber, B.M., Odean, T.: Trading is hazardous to your wealth: the common stock
investment performance of individual investors. J. Financ. 2, 773–806 (2000)
4. Fama, E.F.: The behavior of stock market prices. J. Bus. 38, 34–105 (1965a)
5. Fama, E.F.: Random walks in stock prices. Financ. Anal. J. 21, 55–59 (1965b)
6. Fama, E.F.: Efficient capital markets: a review of theory and empirical work. J.
Financ. 25, 383–417 (1970)
7. Fama, E.F., French, K.R.: The capital asset pricing model: theory and evidence.
J. Econ. Perspect. 18, 25–46 (2004)
8. Malkiel, B.G.: A Random walk down Wall Street, 1st edn. W. W. Norton & Co.,
New York (1973)
9. Nagahara, Y.: Non-Gaussian distribution for stock returns and related stochastic
differential equation. Financ. Eng. Jpn. Mark. 3, 121–149 (1966)
10. Sharpe, W.F.: Capital asset prices: a theory of market equilibrium under conditions
of risk. J. Financ. 19, 425–442 (1964)
11. Sharpe, W.F.: Adjusting for risk in portfolio performance measurement. J. Portfolio
Manag. 1(2), 29–34 (1975)
12. Tran, L.T.: How wavelet trading “beats” the market. J. Stock Forex Trading 6,
1–6 (2017)
13. Yardani, E.: Predicting the Markets: A Professional Autobiography. Amazon.com
(2018)
Flexible Constructions for Bivariate
Copulas Emphasizing Local Dependence

Xiaonan Zhu1 , Qingsong Shan2 , Suttisak Wisadwongsa3 ,


and Tonghui Wang1(B)
1
Department of Mathematical Sciences, New Mexico State University,
Las Cruces, USA
{xzhu,twang}@nmsu.edu
2
School of Statistics, Jiangxi University of Finance and Economics, Nanchang, China
qingsongshan@gmail.com
3
Graduate School, Chiang Mai University, Chiang Mai, Thailand
titansteng@gmail.com

Abstract. In this paper, a flexible method for constructing bivariate


copulas is provided, which is a generalization of the so-called “gluing
method” and “rectangular patchwork” constructions. A probabilistic
interpretation of the construction is provided through the concept of
the threshold copula. Properties of the construction and best-possible
bounds of copulas with given threshold copulas are investigated. Exam-
ples are given for the illustration of our results.

Keywords: Copula · Construction of bivariate copulas


Local dependence · Best-possible bound

1 Introduction

For the purpose of describing the dependence among random variables, in recent
years, copulas are extensively studied by researchers and have been applied in
many fields, e.g., decision science [2], reliability theory [21,26,34], risk models
[24] and hydrology [36].
By Sklar’s theorem, the importance of copulas stems from two aspects. First,
most dependence properties of random variables can be captured by copulas,
which are independent of marginal distributions and which are, in general, easier
to be handled than the original joint distributions, e.g., [7,22,37,39,42,43,45,
46,48], etc. Second, copulas provide an efficient way to construct multivariate
distributions with given marginal distributions. Therefore, it is important to
consider constructions of copulas, e.g., for bivariate copulas, [10,11,16,28,38,
47], and for multivariate copulas, [17,49], etc. Especially, the so-called “gluing
method” and “rectangular patchwork” constructions of copulas were studied by
[38] and [16], respectively, in which copulas are glued on rectangular subsets
of I 2 . In this article, those constructions are generalized into a more flexible

c Springer Nature Switzerland AG 2019


V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 138–151, 2019.
https://doi.org/10.1007/978-3-030-04263-9_10
Flexible Constructions for Bivariate Copulas Emphasizing Local Dependence 139

setting, and its probabilistic interpretation is provided through the concept of


the threshold copula [15].
The paper is organized as follows. Necessary concepts and properties of copu-
las are briefly reviewed in Sect. 2. Our main results are given in Sect. 3, in which
a flexible construction of bivariate copulas and its probabilistic interpretation
are given, and properties and best-possible bounds with given threshold copulas
are investigated. Conclusion is given in Sect. 4.

2 Preliminaries

Let I = [0, 1] and I 2 = [0, 1] × [0, 1]. A function C : I 2 → I is called a bivariate


copula if it satisfies the following three conditions,

(i) C(u, 0) = C(0, v) = 0 for any u, v ∈ I;


(ii) C(u, 1) = u and C(1, v) = v for all u, v ∈ I;
(iii) C is 2-increasing on I 2 , i.e., for every 0 ≤ u1 ≤ u2 ≤ 1 and 0 ≤ v1 ≤ v2 ≤ 1,
 
VC [u1 , u2 ] × [v1 , v2 ] = C(u2 , v2 ) − C(u2 , v1 ) − C(u1 , v2 ) + C(u1 , v1 ) ≥ 0,

where VC ([u1 , u2 ] × [v1 , v2 ]) is called the C-volume over [u1 , u2 ] × [v1 , v2 ].


Throughout this paper, we will focus on bivariate copulas. Thus, in the sequel, we
may omit the word “bivariate”. Let X and Y be random variables defined on a
probability space (Ω, A , P ) with the joint distribution function H and marginal
distribution functions F and G, respectively, where Ω is a sample space, A is a
σ-algebra of Ω and P is a probability measure on A . By Sklar’s theorem [40],
there is a copula C such that for any x and y in R̄ = R ∪ {−∞, ∞},

H(x, y) = C(F (x), G(y)). (1)

If F and G are continuous, then the copula C is unique and C is called the
copula of X and Y . There are three important copulas defined as follows.

Π(u, v) = uv, M (u, v) = min{u, v} and W (u, v) = max{u + v − 1, 0}

for all (u, v) ∈ I 2 . Π is called the product copula or independence copula. M and
W are called, respectively, the Fréchet-Hoeffding upper bound and lower bound
of copulas. Π, M and W have following well-known properties.

(i) Let X and Y be continuous random variables. X and Y are independent if


and only if their copula is Π;
(ii) W (u, v) ≤ C(u, v) ≤ M (u, v) for all copula C and every (u, v) ∈ I 2 ;
(iii) Let X and Y be continuous random variables. Then Y is almost surely an
increasing (or decreasing) function of X if and only if their copula is M (or
W ). For a comprehensive introduction on the theory of copulas, the reader
is referred to the monographs [18,27].
140 X. Zhu et al.

3 Main Results
3.1 A Flexible Construction and Its Properties

First of all, let’s select a copula C0 and use P0 to denote the probability measure
induced by C0 on I 2 . C0 is called a background copula since it will determine
the “flatness” of I 2 in our construction. Let P = {Ai }ni=1 be a collection of
measurable subsets of I 2 such that P0 (Ai ∩ Aj ) = 0 for all i = j, where n
is an integer. Without loss of generality, we may assume P0 (Ai ) > 0 for each
i = 1, · · · , n. Also, let C = {Ci }ni=1 be a collection of copulas. Note that C0
and some Ci ’s may be identical and we require P and C to have the same
cardinality. For convenience, we use a triple (P, C , C0 ) to denote a collection of
subsets of I 2 , a collection of copulas and a background copula satisfying above
conditions throughout this work. Then, we have the following construction.
Definition 1. Given a triple (P, C , C0 ), we define a function CP ,C ,C0 : I 2 → I
such that
n
         
CP ,C ,C0 (u, v) = ai Ci P0 [0, u] × I Ai , P0 I × [0, v]Ai + P0 [0, u] × [0, v] ∩ Ac
i=1
(2)
n
for all (u, v) ∈ I , where ai = P0 (Ai ), i = 1, · · · , n, A = I \ ∪ Ai and P0 (·|·)
2 c 2
i=1
denotes the conditional probability. CP ,C ,C0 is called the copula generated by
(P, C , C0 ).
The following theorem shows that Definition 1 is well-defined.
Theorem 1. The function CP ,C ,C0 defined by Eq. (2) is a copula.

Proof. (i) For any v ∈ I,


n
         c
CP ,C ,C0 (0, v) = ai Ci P0 {0} × I Ai , P0 I × [0, v]Ai + P0 {0} × [0, v] ∩ A = 0.
i=1

Similarly, CP ,C ,C0 (u, 0) = 0 for every u ∈ I.


(ii) For each v ∈ I,
n
         
CP ,C ,C0 (1, v) = ai Ci P0 [0, 1] × I Ai , P0 I × [0, v]Ai + P0 [0, 1] × [0, v] ∩ Ac
i=1
n
    
= P0 I × [0, v] ∩ Ai + P0 I × [0, v] ∩ Ac = P0 (I × [0, v]) = v.
i=1

Similarly, CP ,C ,C0 (u, 1) = u for any u ∈ I.


Flexible Constructions for Bivariate Copulas Emphasizing Local Dependence 141

(iii) For any 0 ≤ u1 ≤ u2 ≤ 1 and 0 ≤ v1 ≤ v2 ≤ 1,

VCP ,C ,C0 ([u1 , u2 ] × [v1 , v2 ])


= CP ,C ,C0 (u2 , v2 ) − CP ,C ,C0 (u2 , v1 )
− CP ,C ,C0 (u1 , v2 ) − CP ,C ,C0 (u1 , v1 )
n       
= ai Ci P0 [0, u2 ] × I Ai , P0 I × [0, v2 ]Ai
i=1
n       
− ai Ci P0 [0, u1 ] × I Ai , P0 I × [0, v2 ]Ai
i=1

n       
− ai Ci P0 [0, u2 ] × I Ai , P0 I × [0, v1 ]Ai
i=1

n       
+ ai Ci P0 [0, u1 ] × I Ai , P0 I × [0, v1 ]Ai
i=1
   
+ P0 [0, u2 ] × [0, v2 ] ∩ Ac − P0 [0, u1 ] × [0, v2 ] ∩ Ac
   
− P0 [0, u2 ] × [0, v1 ] ∩ Ac + P0 [0, u1 ] × [0, v1 ] ∩ Ac
 
n
     

= ai VCi P0 [0, u1 ] × I Ai , P0 [0, u2 ] × I Ai ,


i=1

     

P0 I × [0, v1 ]Ai , P0 I × [0, v2 ]Ai


 
+ P0 (u1 , u2 ] × (v1 , v2 ] ∩ Ac ≥ 0.

Thus, CP ,C ,C0 (u, v) is a copula.


 
Remark 1. (i) The last term P0 [0, u] × [0, v] ∩ Ac in Eq. (2) can be rewritten as
        
P0 [0, u] × [0, v] ∩ Ac = aAc CAc P0 [0, u] × I Ac , P0 I × [0, v]Ac

by some copula CAc so that it has the same form with other terms, where
aAc = P0 (Ac ). However, in general, it is not  easy to derive the  closed form of
CAc . The benefit of using the expression P0 [0, u] × [0, v] ∩ Ac for the last term
of Eq. (2) is that CP ,C ,C0 is identical to C0 on Ac , i.e., CP ,C ,C0 (u, v) = C0 (u, v)
for any (u, v) ∈ Ac .
(ii) In constructions of “gluing method” [38] and “rectangular patchwork” [16],
copulas are linked only over rectangles, i.e., cartesian products of two closed
subintervals of I 2 , but in the construction given by Definition 1, Ai ’s can be any
subsets of I, so our method includes their constructions. Also, it can be seen
that our construction is very flexible by following examples.

Example 1. (i) Choose C1 = {M, Π, W } and P1 = {D1 , D2 , D3 }, where D1 =


{(u, v) ∈ I 2 : v ≥ 2u}, D2 = {(u, v) ∈ I 2 : 0.5u ≤ v ≤ 2u} and D3 = {(u, v) ∈
I 2 : v ≤ 0.5u}. The graphs of the partition and CP 1 ,C 1 ,Π are given in Fig. 1.
142 X. Zhu et al.

1.0

0.8

M
0.6
V

0.4

0.2

W
0.0
0.0 0.2 0.4 0.6 0.8 1.0
U

Fig. 1. Graphs of the partition P1 and CP 1 ,C 1 ,Π

1.0

0.8

M
0.6
V

0.4

0.2
W
0.0
0.0 0.2 0.4 0.6 0.8 1.0
U

Fig. 2. Graphs of the partition P2 and CP 2 ,C 2 ,Π

(ii) Let C2 = {M, Π, W } and P2 = {E1 , E2 , E3 } where E1 = {(u, v) ∈ I 2 : v ≥


1 − (1 − u)2 }, E2 = {(u, v) ∈ I 2 : u2 ≤ v ≤ 1 − (1 − u)2 } and E3 = {(u, v) ∈ I 2 :
v ≤ u2 }. The graphs of the partition and CP 2 ,C 2 ,Π are given in Fig. 2.
The background copula C0 is nontrivial in the next example.
Example 2. Consider the Ali-Mikhail-Haq copula [3] given by
uv
C AM H (u, v) = . (3)
1 − (1 − u)(1 − v)

Let Pd be the partition of I 2 by the diagonal u = v. Then CP d ,{M,W },C AM H


is different from CP d ,{M,W },Π generated by (Pd , {M, W }, Π). The graph of
CP d ,{M,W },C AM H and the contour plot of the difference CP d ,{M,W },C AM H −
CP d ,{M,W },Π are given in Fig. 3.
In general, it is not easy to derive the explicit expression of a construction
defined by Eq. (2). The next result gives us the general formulas for constructions
when C0 = Π and I 2 is partitioned by a straight line from the left edge u = 0
of I 2 to the right edge u = 1.
Flexible Constructions for Bivariate Copulas Emphasizing Local Dependence 143

v
1.0

0.8
0.03

0.02

0.6 0.01

- 0.01

0.4 - 0.02

- 0.03

0.2

u
0.2 0.4 0.6 0.8 1.0

Fig. 3. CP d ,{M,W },C AM H and contour plot of CP d ,{M,W },C AM H − CP d ,{M,W },Π

Corollary 1. (i) Let I 2 be partitioned by P1 = {B1 , B2 }, where B1 = {(u, v) ∈


I 2 : v ≤ (b2 − b1 )u + b1 } and B2 = {(u, v) ∈ I 2 : v ≥ (b2 − b1 )u + b1 } with
0 ≤ b1 , b2 ≤ 1. Let b− = min{b1 , b2 } and b+ = max{b1 , b2 }. Then the copula
generated by (P1 , {C1 , C2 }, Π) is given by

Cb1 ,b2 ,C1 ,C2 (u, v)


⎧  
⎪ b1 +b2 (b2 −b1 )u2 +2b1 u

⎪ 2
C1 b +b
, b 2v+b
, if 0 ≤ v ≤ b− ,

⎪ 1 2 1 2

⎪ 2
(b2 −b1 )u +2b1 u 2b v−v 2
−b2
⎪ b +b
⎨ 1 2 2 C1 ,
+
2 2

b1 +b2 b+ −b−
=

⎪ 2−b1 −b2 (2−2b1 )u−(b2 −b1 )u2 (v−b− )2

⎪ + 2
C 2 2−b −b
, (b −b )(2−b −b )
, if b− ≤ v ≤ b+ ,

⎪ 1

2 + − 1 2


⎪ 2 2
⎩ (b2 −b1 )u +2b1 u + 2−b1 −b2 C2 (2−2b1 )u−(b2 −b1 )u , 2v−b1 −b2 , if b+ ≤ v ≤ 1.
2 2 2−b −b
1 2 2−b −b 1 2

(4)

(ii) Specifically, if b1 = 0 and b2 = 1, then


   
C0,1,C1 ,C2 (u, v) = 0.5C1 u2 , 2v − v 2 + 0.5C2 2u − u2 , v 2 . (5)

And, if b1 = 1 and b2 = 0 then


   
C1,0,C1 ,C2 (u, v) = 0.5C1 2u − u2 , 2v − v 2 + 0.5C2 u2 , v 2 . (6)

Remark 2. Note that if b1 = b2 , then the above construction (4) is reduced to the
gluing construction of two copulas studied by [38]. In addition, the construction
(4) is different from constructions studied by [23,33], in which they studied
constructions of copulas with given affine sections and sub-diagonal sections,
respectively. As an example of constructions (5) and (6), graphs of C0,1,M,W
and C1,0,M,W are provided in Fig. 4.

The diagonal section and opposite diagonal section of a copula C are the
function δC : I → I and ωC : I → I defined by δC (t) = C(t, t) and
ωC (t) = C(t, 1 − t), respectively. A diagonal function is a function δ : I → I
144 X. Zhu et al.

Fig. 4. Graphs of C0,1,M,W and C1,0,M,W

such that δ(1) = 1, δ(t) ≤ t for all t ∈ I and δ is increasing and 2-Lipschitz, i.e.,
|δ(t ) − δ(t)| ≤ 2|t − t| for all 0 ≤ t , t ≤ 1. An opposite diagonal function is a
function ω : I → I such that ω(t) ≤ min(t, 1−t) for all t ∈ I and ω is 1-Lipschitz,
i.e., |δ(t ) − δ(t)| ≤ |t − t| for all 0 ≤ t , t ≤ 1. The (opposite) diagonal section of
a copula is a (opposite) diagonal section. Conversely, for any diagonal function δ
(or opposite diagonal function ω), there exist copulas with diagonal section δ (or
opposite diagonal section ω) [12]. For our constructions, we have the following
results.

Proposition 1. The diagonal sections and opposite diagonal sections of con-


structions (5) and (6) are given by

δ0,1,C1 ,C2 (t) = 0.5C1 (t2 , 2t − t2 ) + 0.5C2 (2t − t2 , t2 ),


(7)
ω0,1,C1 ,C2 (t) = 0.5ωC1 (t2 ) + 0.5ωC2 (2t − t2 ),

and
δ1,0,C1 ,C2 (t) = 0.5δC1 (2t − t2 ) + 0.5δC2 (t2 ),
(8)
ω1,0,C1 ,C2 (t) = 0.5C1 (2t − t2 , 1 − t2 ) + 0.5C2 (t2 , (1 − t)2 ).

Above results can be used to construct copulas with given diagonal or oppo-
site diagonal sections.

Example 3. Consider the diagonal function δ(t) = 2t3 − t4 . Let C1 (u, v) and
C2 (u, v) be F-G-M copulas [3] given by

C1 (u, v) = uv(1 + θ(1 − u)(1 − v)), and C2 (u, v) = uv(1 − θ(1 − u)(1 − v)),

where θ ∈ [−1, 1]. Then it can be shown that δ0,1,C1 ,C2 (t) = 2t3 − t4 , which is
free of θ.

The convergence results for patchwork copulas were studied by several


researchers [8,13]. For our construction, we have the following result.
Flexible Constructions for Bivariate Copulas Emphasizing Local Dependence 145

Proposition 2. If we fix the background copula C0 and the number of subsets


in P, then copulas generated by (P, C , C0 ) for arbitrary collections of copulas
C and arbitrary collections of subsets P converge to C0 under the uniform norm
|| · ||∞ as max{ai = P0 (Ai )} shrinks to 0.
i

Proof. First, note that ||M − W ||∞ = 0.5 and for the fixed C0 and any P =
{Ai }ni=1 , by Sklar’s theorem, there are copulas C1P , · · · , CnP such that

n         
C0 (u, v) = ai CiP P0 [0, u]×I Ai , P0 I ×[0, v]Ai +P0 [0, u]×[0, v]∩Ac .
i=1

For any ε > 0 and any P such that max{ai = P0 (Ai )} < 2ε
n,
i


n

||CP ,C ,C0 − C0 ||∞ ≤ ai ||Ci − CiP ||∞ < n ||M − W ||∞ = ε.
i=1
n

The proof is completed.

3.2 A Probabilistic Interpretation of the Construction


For a probabilistic interpretation of our construction, we need the concept of the
threshold copula, which was defined by [15] as follows.
Definition 2. Let X and Y be random variables defined on a probability space
(Ω, A , P ). Suppose that B ⊆ R̄2 is such that P ({ω ∈ Ω : (X(ω), Y (ω)) ∈ B}) >
0. Then the threshold copula of X and Y over B is a copula CB such that
 
P (X ≤ x, Y ≤ y|(X, Y ) ∈ B) = CB P (X ≤ x|(X, Y ) ∈ B), P (Y ≤ y|(X, Y ) ∈ B) (9)

for all x, y ∈ B.
Remark 3. (i) From Definition 2, local dependence of X and Y over B can be
captured by the threshold copula of X and Y over B. In [19,20,32], several
versions of conditional copulas were defined by conditional probabilities, which
are slightly different from the threshold copula.
(ii) In general, threshold copulas of a copula C over arbitrary subsets of I 2 are
different from C. For example, let B1 = {(u, v) ∈ I 2 : v ≤ u} and B2 = {(u, v) ∈
I 2 : v ≥ u}. Then it can be shown that threshold copulas of Π over B1 and B2 ,
respectively, are
√ √ √  √ √ 2
ΠB1 (u, v) = 2 u min u, 1 − 1 − v − min u, 1 − 1 − v ,

and
√ √ √  √ √ 2
ΠB2 (u, v) = 2 v min 1 − 1 − u, v − min 1 − 1 − u, v .

However, it can be proved that for any rectangle B = [a1 , a2 ] × [b1 , b2 ] ⊆ I 2 such
that VC (A) > 0, where C = M , W or Π, then the threshold copula of C over B
is C again.
146 X. Zhu et al.

Now, by using threshold copulas, a probabilistic interpretation of the con-


struction (2) can be given as follows. The proof is not difficult.
Theorem 2. Given a triple (P, C , C0 ), where P = {Ai }ni=1 and C = {Ci }ni=1 .
Then the threshold copula of CP ,C ,C0 over Ai is Ci , i = 1, · · · , n.

3.3 Best-Possible Bounds for the Construction

In view of the relationship between copulas and bivariate distributions with


given marginals, recently, the problem of finding bounds on classes of copulas
with prescribed properties has been studied by many researchers, e.g., given
values on some points of I 2 [25,35], given values on subsets of I 2 [6,14,41],
given diagonal sections [30,44], given measures of associations [5,29,31]. For the
construction (2), we have the following result.
Proposition 3. Fix a background copula C0 . Let P be a collection of subsets
of I 2 as in the first paragraph of this section. For arbitrary collection of copulas
C such that |C | = |P| = n and any (u, v) ∈ I 2 ,

CP ,W ,C0 (u, v) ≤ CP ,C ,C0 (u, v) ≤ CP ,M ,C0 (u, v),

where | · | is the cardinality, W = {W }ni=1 and M = {M }ni=1 .

Fig. 5. Contour plot of CA,W,M − CA,W,C F GM

Since copulas with the same threshold copulas must have the same local
dependence, it is necessary to consider bounds on copulas with given thresh-
old copulas. Intuitively, for fixed subsets of I 2 and fixed threshold copulas, the
best-possible upper bound and lower bound should be generated by choosing
background copulas C0 = M and C0 = W , respectively. However, the follow-
ing example shows that it may not hold in general even the global copulas are
well-defined.
Flexible Constructions for Bivariate Copulas Emphasizing Local Dependence 147

v
1.0

0.8

0.08

0.06
0.6
0.04

0.02

0
0.4
- 0.02

0.2

u
0.2 0.4 0.6 0.8 1.0

Fig. 6. Graphs of CA,W,M and CA,W,CF GM

Example 4. We equip the threshold copula W over A = [0, 0.5]2 . Then it can be
shown that CA,W,M is not always greater than CA,W,C F GM , where C F GM (u, v)
is the F-G-M copula given by C F GM (u, v) = uv(1 + (1 − u)(1 − v)). For example,
CA,W,M (0.2, 0.3) − CA,W,C F GM (0.2, 0.3) = −0.03. The graphs of CA,W,M and
CA,W,C F GM , and the contour plot of CA,W,M − CA,W,C F GM are given Figs. 6 and
5, respectively.
Although the best-possible bounds are not generated by choosing background
copulas to be M or W in general, an ideal result holds if I 2 is partitioned by a
vertical or horizonal line.
Theorem 3. Given two copulas C1 , C2 and a background copula C0 . Let b ∈
(0, 1), B1 = [0, b] × I and B2 = [b, 1] × I. Then
⎧  
⎨bC1 u , C0 (b,v)
b b
, if 0 ≤ u ≤ b,
C{B1 ,B2 },{C1 ,C2 },C0 (u, v) =  
⎩C0 (b, v) + (1 − b)C2 u−b v−C0 (b,v)
1−b
, 1−b
, if b ≤ u ≤ 1,

and

C{B1 ,B2 },{C1 ,C2 },W (u, v) ≤ C{B1 ,B2 },{C1 ,C2 },C0 (u, v) ≤ C{B1 ,B2 },{C1 ,C2 },M (u, v)
(10)
for all copulas C0 and (u, v) ∈ I 2 . Similar results also hold if I 2 is partitioned
by I × [0, b] and I × [b, 1].

Proof. For any (u, v) ∈ I 2 , W (u, v) ≤ C0 (u, v) ≤ M (u, v). So if 0 ≤ u ≤ b, we


have
  
u W (b, v) u C0 (b, v) u M (b, v)
bC1 , ≤ bC1 , ≤ bC1 , .
b b b b b b

If b ≤ u ≤ 1, consider afunction f : [W (b, v), M (b, v)] → I such that f (x) =


1−b , 1−b . Then f is nondecreasing. Indeed, for any W (b, v) ≤
x + (1 − b)C2 u−b v−x
148 X. Zhu et al.

x ≤ x ≤ M (b, v),
   
u − b v − x u−b v−x
f (x ) − f (x) = x − x + (1 − b) C2 , − C2 ,
1−b 1−b 1−b 1−b
 
 v−x 
v − x
≥ x − x − (1 − b)  − = 0.
1−b 1−b

Thus, C{B1 ,B2 },{C1 ,C2 },W (u, v) ≤ C{B1 ,B2 },{C1 ,C2 },C0 (u, v) ≤ C{B1 ,B2 },{C1 ,C2 },M
(u, v) for all (u, v) ∈ I 2 . A completely analogous steps prove the case when I 2 is
partitioned by I × [0, b] and I × [b, 1].

4 Conclusions

In this paper, a flexible method for constructing bivariate copulas is presented,


which includes several constructions as special cases. Properties and a proba-
bilistic interpretation of the construction are given and best-possible bounds of
copulas with given threshold copulas are studied. However, there are still open
problems about this work.

(i) In Proposition 2, the convergence of our constructions is discussed under


the uniform norm || · ||∞ when the number n of subsets in P is fixed. Does
an analogous result hold when n is not fixed and/or under other norms,
e.g., the Sobolev norm [9] for bivariate copulas?
(ii) From Remark 3, threshold copulas of Π, M and W over rectangles, if exist,
are still themselves, Does any other family of copulas have this property?
Or under what conditions will threshold copulas belong to the same family
of their global copula?
(iii) Example 4 shows that analogous results of Theorem 3 do not hold in general.
Can we find best-possible bounds of copulas with given threshold copulas
over arbitrary subsets of I 2 ? Are two bounds still copulas?
(iv) The vine copula [4] or pair-copula construction [1] is a flexible method to
construct multivariate copulas from bivariate copulas. Is it possible to com-
bine our method and vine copula method together to construct multivariate
copulas with desired local and pairwise dependence properties?
These problems will be the objects of our future research.

Acknowledgments. The authors would like to thank Dr. S. Tasena at Chiang Mai
University, Thailand, for his valuable discussions and suggestions during the prepara-
tion of this paper.
Flexible Constructions for Bivariate Copulas Emphasizing Local Dependence 149

References
1. Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple
dependence. Insur.: Math. Econ. 44(2), 182–198 (2009)
2. Abbas, A.E.: Utility copula functions matching all boundary assessments. Oper.
Res. 61(2), 359–371 (2013)
3. Balakrishnan, N.: Continuous Multivariate Distributions. Wiley Online Library,
Hoboken (2006)
4. Bedford, T., Cooke, R.M.: Vines: a new graphical model for dependent random
variables. Ann. Stat. 30(4), 1031–1068 (2002)
5. Beliakov, G., De Baets, B., De Meyer, H., Nelsen, R., Úbeda-Flores, M.: Best-
possible bounds on the set of copulas with given degree of non-exchangeability. J.
Math. Anal. Appl. 417(1), 451–468 (2014)
6. Bernard, C., Jiang, X., Vanduffel, S.: A note on improved Fréchet bounds and
model-free pricing of multi-asset options by Tankov (2011). J. Appl. Probab. 49(3),
866–875 (2012)
7. Boonmee, T., Tasena, S.: Measure of complete dependence of random vectors. J.
Math. Anal. Appl. 443(1), 585–595 (2016)
8. Chaidee, N., Santiwipanont, T., Sumetkijakan, S.: Patched approximations and
their convergence. Commun. Stat.-Theory Methods 45(9), 2654–2664 (2016)
9. Darsow, W.F., Olsen, E.T.: Norms for copulas. Int. J. Math. Math. Sci. 18(3),
417–436 (1995)
10. de Amo, E., Carrillo, M.D., Fernández-Sánchez, J.: Characterization of all copulas
associated with non-continuous random variables. Fuzzy Sets Syst. 191, 103–112
(2012)
11. De Baets, B., De Meyer, H.: Orthogonal grid constructions of copulas. IEEE Trans.
Fuzzy Syst. 15(6), 1053–1062 (2007)
12. De Baets, B., De Meyer, H., Úbeda-Flores, M.: Constructing copulas with given
diagonal and opposite diagonal sections. Commun. Stat.-Theory Methods 40(5),
828–843 (2011)
13. Durante, F., Fernández-Sánchez, J., Quesada-Molina, J.J., Úbeda-Flores, M.: Con-
vergence results for patchwork copulas. Eur. J. Oper. Res. 247(2), 525–531 (2015)
14. Durante, F., Fernández-Sánchez, J., Quesada-Molina, J.J., Úbeda-Flores, M.: Cop-
ulas with given values on the tails. Int. J. Approx. Reason. 85, 59–67 (2017)
15. Durante, F., Jaworski, P.: Spatial contagion between financial markets: a copula-
based approach. Appl. Stoch. Models Bus. Ind. 26(5), 551–564 (2010)
16. Durante, F., Saminger-Platz, S., Sarkoci, P.: Rectangular patchwork for bivariate
copulas and tail dependence. Commun. Stat.-Theory Methods 38(15), 2515–2527
(2009)
17. Durante, F., Sánchez, J.F., Sempi, C.: Multivariate patchwork copulas: a unified
approach with applications to partial comonotonicity. Insur.: Math. Econ. 53(3),
897–905 (2013)
18. Durante, F., Sempi, C.: Principles of Copula Theory. CRC Press, Boca Raton
(2015)
19. Fermanian, J.-D., Wegkamp, M.H.: Time-dependent copulas. J. Multivar. Anal.
110, 19–29 (2012)
20. Gijbels, I., Veraverbeke, N., Omelka, M.: Conditional copulas, association measures
and their applications. Comput. Stat. Data Anal. 55(5), 1919–1932 (2011)
21. Gupta, N., Misra, N., Kumar, S.: Stochastic comparisons of residual lifetimes and
inactivity times of coherent systems with dependent identically distributed com-
ponents. Eur. J. Oper. Res. 240(2), 425–430 (2015)
150 X. Zhu et al.

22. Joe, H.: Multivariate Models and Multivariate Dependence Concepts. CRC Press,
Boca Raton (1997)
23. Klement, E.P., Kolesárová, A.: Intervals of 1-lipschitz aggregation operators, quasi-
copulas, and copulas with given affine section. Monatshefte für Mathematik 152(2),
151–167 (2007)
24. Malevergne, Y., Sornette, D.: Extreme Financial Risks: From Dependence to Risk
Management. Springer Science & Business Media, Heidelberg (2006)
25. Mardani-Fard, H., Sadooghi-Alvandi, S., Shishebor, Z.: Bounds on bivariate distri-
bution functions with given margins and known values at several points. Commun.
Stat.-Theory Methods 39(20), 3596–3621 (2010)
26. Navarro, J., Pellerey, F., Di Crescenzo, A.: Orderings of coherent systems with
randomized dependent components. Eur. J. Oper. Res. 240(1), 127–139 (2015)
27. Nelsen, R.B.: An Introduction to Copulas. Springer Science & Business Media,
Heidelberg (2007)
28. Nelsen, R.B., Quesada-Molina, J.J., Rodrı́guez-Lallena, J.A., Úbeda-Flores, M.:
On the construction of copulas and quasi-copulas with given diagonal sections.
Insur.: Math. Econ. 42(2), 473–483 (2008)
29. Nelsen, R.B., Quesada-Molina, J.J., Rodriı́guez-Lallena, J.A., Úbeda-Flores, M.:
Bounds on bivariate distribution functions with given margins and measures of
association. Commun. Stat.-Theory Methods 30(6), 1055–1062 (2001)
30. Nelsen, R.B., Quesada-Molina, J.J., Rodriı́guez-Lallena, J.A., Úbeda-Flores, M.:
Best-possible bounds on sets of bivariate distribution functions. J. Multivar. Anal.
90(2), 348–358 (2004)
31. Nelsen, R.B., Úbeda-Flores, M.: A comparison of bounds on sets of joint distribu-
tion functions derived from various measures of association. Commun. Stat.-Theory
Methods 33(10), 2299–2305 (2005)
32. Patton, A.J.: Modelling asymmetric exchange rate dependence. Int. Econ. Rev.
47(2), 527–556 (2006)
33. Quesada-Molina, J.J., Saminger-Platz, S., Sempi, C.: Quasi-copulas with a given
sub-diagonal section. Nonlinear Anal.: Theory Methods Appl. 69(12), 4654–4673
(2008)
34. Rychlik, T.: Copulae in reliability theory (order statistics, coherent systems). In:
Copula Theory and Its Applications, pp. 187–208. Springer, Heidelberg (2010)
35. Sadooghi-Alvandi, S., Shishebor, Z., Mardani-Fard, H.: Sharp bounds on a class
of copulas with known values at several points. Commun. Stat.-Theory Methods
42(12), 2215–2228 (2013)
36. Salvadori, G., De Michele, C., Kottegoda, N.T., Rosso, R.: Extremes in Nature: An
Approach Using Copulas, vol. 56. Springer Science & Business Media, Heidelberg
(2007)
37. Schweizer, B., Wolff, E.F.: On nonparametric measures of dependence for random
variables. Ann. Stat. 9(4), 879–885 (1981)
38. Siburg, K.F., Stoimenov, P.A.: Gluing copulas. Commun. Stat.-Theory Methods
37(19), 3124–3134 (2008)
39. Siburg, K.F., Stoimenov, P.A.: A measure of mutual complete dependence. Metrika
71(2), 239–251 (2010)
40. Sklar, M.: Fonctions de répartition á n dimensions et leurs marges. Université Paris
8 (1959)
41. Tankov, P.: Improved fréchet bounds and model-free pricing of multi-asset options.
J. Appl. Probab. 48(2), 389–403 (2011)
42. Tasena, S., Dhompongsa, S.: A measure of multivariate mutual complete depen-
dence. Int. J. Approx. Reason. 54(6), 748–761 (2013)
Flexible Constructions for Bivariate Copulas Emphasizing Local Dependence 151

43. Tasena, S., Dhompongsa, S.: Measures of the functional dependence of random
vectors. Int. J. Approx. Reason. 68, 15–26 (2016)
44. Úbeda-Flores, M.: On the best-possible upper bound on sets of copulas with given
diagonal sections. Soft Comput. Fusion Found. Methodol. Appl. 12(10), 1019–1025
(2008)
45. Wei, Z., Kim, D.: On multivariate asymmetric dependence using multivariate skew-
normal copula-based regression. Int. J. Approx. Reason. 92, 376–391 (2018)
46. Wei, Z., Wang, T., Nguyen, P.A.: Multivariate dependence concepts through cop-
ulas. Int. J. Approx. Reason. 65, 24–33 (2015)
47. Wisadwongsa, S., Tasena, S.: Bivariate quadratic copula constructions. Int. J.
Approx. Reason. 92, 1–19 (2018)
48. Zhu, X., Wang, T., Choy, S.B., Autchariyapanitkul, K.: Measures of mutually
complete dependence for discrete random vectors. In: Predictive Econometrics and
Big Data, pp. 303–317. Springer, Heidelberg (2018)
49. Zhu, X., Wang, T., Pipitpojanakarn, V.: Constructions of multivariate copulas. In:
Robustness in Econometrics, pp. 249–265. Springer, Heidelberg (2017)
Desired Sample Size for Estimating
the Skewness Under Skew Normal
Settings

Cong Wang1 , Tonghui Wang1(B) , David Trafimow2 , and Hunter A. Myüz2


1
Department of Mathematical Sciences, New Mexico State University,
Las Cruces, USA
{cong960,twang}@nmsu.edu
2
Department of Psychology, New Mexico State University, Las Cruces, USA
{dtrafimo,hamz}@nmsu.edu

Abstract. In this paper, the desired sample size for estimating the
skewness parameter with given closeness and confidence level under skew
normal populations. The confidence intervals for skewness parameter are
constructed based on the desired sample sizes using two pivots, chi-square
distribution and F -distribution. Computer simulations support our main
results. At the end, a real data example is provided for illustration of
constructed confidence intervals.

Keywords: Skew normal · Confidence interval · Skewness parameter

1 Introduction
It is well known that many data sets from financial and biomedical fields have
skewed distributions. This is a reason why the classical normal distribution is
not so adequate to model the data from these areas even though it is popular and
easy to handle. For data that do not follow a normal distribution, it is natural
to consider the family of skew normal distributions, which extend the family of
normal distributions.
Azzalini [1] provided the probability density function of the skew normal dis-
tribution. Since Azzalini, the family of skew normal distributions has been stud-
ied by many researchers, see, e.g., Gupta [3,4], Vernic [10], and Wang et al. [11].
Theoretically, the skew normal family shares many properties of normal distri-
bution, for example, if Z denotes a random variable SN (0, 1, α), then Z 2 ∼ χ21 ,
irrespective of the skewness parameter α.
Now suppose that we have a population from the skew normal family and
want to construct the confidence interval for the skewness parameter. Other than
using a significance test, we start from the question: how many participants are
needed so we can be confident that the sample skewness estimator is close to the
population skewness parameter? For the normal case, Trafimow [6] provided the
answer for the location parameter by fixing the probability of the difference of
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 152–162, 2019.
https://doi.org/10.1007/978-3-030-04263-9_11
Desired Sample Size for Estimating the Skewness 153

sample mean and population mean within some precision f standard deviation at
confidence level c. Trafimow [7] showed how to obtain the necessary sample size
to meet specifications for single means under the normal distribution; Trafimow
and MacDonald [8] extended this to k means; and Trafimow et al. [9] provided
a further extension to the family of skew-normal distributions, using locations
instead of means. Also, a way to estimate the desired sample size for simple
random sampling of a skewed population has been discussed by Gregoire and
Affleck [5].
In this paper, we consider the skewness parameter from a skew normal popu-
lation. The paper is organized as follows. Some properties of skew normal distri-
bution are listed in Sect. 2. The methods for deriving the least required sample
size are obtained in Sect. 3, and the simulation work is provided is Sect. 4.

2 Some Properties of the Skew Normal Distribution


Definition 2.1. A random variable Z is said to have the standard skew normal
distribution, denoted by Z ∼ SN (0, 1, λ), if its probability density function is

fZ (z) = 2φ(z)Φ(λz), (1)

where φ(·) and Φ(·) are the density and cumulative distribution function of the
standard normal distribution, respectively. λ is called the skewness parameter.
There is an alternative representation of Z ∼ SN (0, 1, λ) which is the following
lemma.
Lemma 1. Suppose that δ is an arbitrary value from the interval (−1, 1). If
U0 , U1 are independent standard normal variables, then

Z = δ|U0 | + 1 − δ 2 U1 ∼ SN (0, 1, λ),

where λ = √ δ .
1−δ 2

For more details, see Azzalini’s book [2].


The extension of this lemma is listed below.
Proposition 2.1. Let Z0 , Z1 , ..., Zn be a random sample from the normal dis-
tribution with mean 0 and variance ω 2 and

Xi = δ|Z0 | + 1 − δ 2 Zi

for i = 1, ..., n and δ a value from the interval (−1, 1). Then X1 , ..., Xn are
identically distributed, i.e.,

Xi ∼ SN (0, ω 2 , λ)
n
for all i = 1, ..., n, and if we denote X̄ = n1 i=1 Xi , then
√ 1 + (n − 1)δ 2 2
X̄ ∼ SN (0, ω∗2 , nλ), ω∗2 = ω .
n
154 C. Wang et al.

Proof. Since Z0 ∼ N (0, ω 2 ), the moment generating function of |Z0 | is


 2 2
ω t
M|Z0 | (t) = 2exp Φ(ωt).
2

Note that Z0 and Zi ’s are independent. We know that all Xi ’s for i = 1, ..., n
have the same moment generating function, i.e.,
 2 2
ω t
MXi (t) = 2exp Φ(δωt).
2

Since λ = √ δ , δ= √ λ . Similarly, the distribution of X̄ can be obtained.


1−δ 2 1+λ2


Proposition 2.2. Let Z0 , Z1 , ..., Zn be a random sample from the normal dis-
tribution with mean 0 and variance ω 2 and

Xi = ξ + δ|Z0 | + 1 − δ 2 Zi

for i = 1, ..., n where δ is a value form the interval (−1, 1) and ξ ∈ R. Then
X1 , ..., Xn are identically distributed, i.e.,

Xi ∼ SN (ξ, ω 2 , λ)

for all i = 1, ..., n.

Proof. The proof is immediately by the moment generating function in the proof
of the above proposition. 

Proposition 2.3. Under the conditions given in Proposition 2.1, let X̄ and S 2
be the sample mean and sample variance, respectively.
(i) Then S 2 and X̄ are independent.
2
(ii) Let Y = X̄
S 2 . Then the mean and standard deviation of Y are

n − 1 1 + (n − 1)δ 2 2(n − 2)
E(Y ) = , σ Y = E(Y )
n − 3 n(1 − δ 2 ) n−5
provided n > 5.
Proof. Note that
n
1 
S2 = (Xi − X̄)2 = (1 − δ 2 )SZ2 ,
n − 1 i=1
n
i=1 (Zi − Z̄) . Since Z1 , ..., Zn are independent standard
1
where SZ2 = n−1 2

normal random variables, we know that the sample mean Z̄ and the sample
variance SZ2 are independent. 0 | is independent from Z̄ and SZ , we
2
√ Also, since|Z
obtain that X̄ = δ|Z0 | + 1 − δ Z̄ and S are independent.
2 2
Desired Sample Size for Estimating the Skewness 155

2 √
From Proposition 2.1, X̄ ∼ SN (0, 1+(n−1)δ
n ω 2 , nλ). Then

nX̄ √
 ∼ SN (0, 1, nλ).
[ 1 + (n − 1)δ ]ω
2

Let
nX̄ 2 (n − 1)S 2
V1 = , V2 = .
[1 + (n − 1)δ 2 ]ω 2 (1 − δ 2 )ω 2
Then V1 ∼ χ21 . Note that V2 ∼ χ2n−1 and V1 and V2 are independent. If we
denote T = (n − 1) VV12 , then T has the F-distribution with degrees of freedom
1 and n − 1. Thus, by the mean and standard deviation of F-distribution, the
results are obtained. 

3 The Sample Size Needed for a Given Sampling


Precision
Suppose that Z0 , Z1 , ..., Zn is a random sample from the normal distribution
with mean 0 and variance ω 2 . Let

Xi = ξ + δ|Z0 | + 1 − δ 2 Zi

for i = 1, ..., n and δ a value from the interval (−1, 1). From the Proposition 2.2,
we have that X1 , ..., Xn are dependent random variables from skew normal pop-
ulation with location parameter ξ, scale parameter ω 2 and skewness parameter
λ. It is clear that there is a one-to-one function between λ and δ. In this paper,
we will pay attention to the confidence interval of δ 2 with known ξ so that the
λ’s can be obtained by solving the equation of the relation between λ and δ.
Without loss of generality, we assume ξ = 0 in this paper.

3.1 The Sample Size Needed for a Given Sampling Precision


with Known ω 2
In order to determine the minimum sample size n needed to be c × 100% confi-
dent under the given sampling precision, we consider the distribution of sample
 above. Let γ = 1 − δ , then it is clear that the standard
variance S 2 defined 2 2

2
deviation of S is n−1 γ ω , which is proportional to γ 2 ω 2 .
2 2 2

Theorem 3.1. Let c be the confidence level and f be the precision which are
specified such that the error associated with estimator S 2 is E = f γ 2 ω 2 . More
specifically, if

P f1 γ 2 ω 2 ≤ S 2 − E(S 2 ) ≤ f2 γ 2 ω 2 = c (2)
where f1 and f2 are restricted by max(|f1 |, f2 ) ≤ f , and E(S 2 ) is the expectation
of S 2 , then the minimum sample size required can be obtained by
U
f (z)dz = c, (3)
L
156 C. Wang et al.

such that U − L is minimized, where f (z) is the probability density function of


the chi-square distribution with n − 1 degrees of freedom, and

L = (n − 1)(f1 + 1), U = (n − 1)(f2 + 1).

Proof. Since S 2 = γ 2 SZ2 , E(S 2 ) = γ 2 ω 2 . Then by simplifying (2), we obtain

P [(n − 1)(f1 + 1) ≤ Z ≤ (n − 1)(f2 + 1)] = c,


2
where Z = (n−1)S
γ 2 ω2 is of the chi-square distribution with n − 1 degrees of free-
dom. If we denote L = (n − 1)(f1 + 1) and U = (n − 1)(f2 + 1), then the required
n can be solved through the integral equation (3). 

From Theorem 3.1, we have the following remark.


Remark 3.1. The value of n obtained is unique together with f1 and f2 . Also
if the conditions in Theorem 3.1 are satisfied, we can construct a c × 100% con-
fidence interval for δ 2 given by

S2 S2
1− , 1− .
(f1 + 1)ω 2 (f2 + 1)ω 2

3.2 The Sample Size Needed for a Given Sampling Precision


with Unknown ω 2
From the proof of Proposition 2.3(ii), we know that S 2 and X̄ 2 are both in the
chi-square distribution family. Then we can discuss the confidence interval of δ 2
by using the F -distribution when ω 2 is unknown. Moreover, it has shown that
2 1+(n−1)δ 2
the standard deviation of X̄
S 2 is proportional to

n(1−δ 2 )
.

Theorem 3.2. Let c be the confidence level and f be the precision which are
2
specified such that the error associated with estimator SX̄2 is E = f 1+(n−1)δ

n(1−δ 2 )
.
More specifically, if
 2
1 + (n − 1)δ 2 X̄ 2 X̄ 1 + (n − 1)δ 2
P f1 √ ≤ − E ≤ f2 √ =c (4)
n(1 − δ 2 ) S2 S2 n(1 − δ 2 )
 2
where f1 and f2 are restricted by max(|f1 |, f2 ) ≤ f , E X̄S2 is the expectation
X̄ 2
of S2 , then the minimum sample size required (n > 3) can be obtained by
U
f (t)dt = c,
L

so that U − L is minimized, where f (t) is the probability density function of the


F -distribution with 1 and n − 1 degrees of freedom, and
n−1 √ n−1 √
L= + nf1 , U= + nf2 .
n−3 n−3
Desired Sample Size for Estimating the Skewness 157

2
Proof. In Proposition 2.3, we obtained the mean of X̄ S 2 , and also constructed
X̄ 2
the F-distribution using S 2 in its proof. So by simplifying (4), we get

P (L ≤ T ≤ U ) = c,

where T is of the F-distribution with degrees of freedom 1 and n − 1, and


n−1 √ n−1 √
L= + nf1 , U= + nf2 .
n−3 n−3

From this theorem, we have the following remark.
Remark 3.2. The value of n obtained is unique together with f1 and f2 . Also
if the conditions in Theorem 3.2 are satisfied, we can construct a c × 100% con-
fidence interval for δ 2 given by

nX̄ 2 − U S 2 nX̄ 2 − LS 2
, ,
nX̄ 2 + U S 2 nX̄ 2 + LS 2

where L and U are the same as in Theorem 3.2.

4 The Simulation Work


We perform computer simulations to support the derivation in Sect. 3. For given
confidence level c = 0.9, 0.95 and precision f = 0.2, 0.4, 0.6, 0.8, 1, the required
values of sample size n, f1 and f2 for the known ω 2 case are listed in Table 1. For
each n and confidence level c, Table 1 also lists the lengths of shortest (SL) and

Table 1. The value of sample size n, left precision f1 , right precision f2 and the lengths
of the intervals for shortest (SL) and equal tail (EL) cases with different precision f
for confidence level c = 0.95, 0.9 when ω 2 is known.

f c n f1 f2 SL EL
0.2 0.9 139 −0.1999 0.1971 54.4 54.56
0.95 201 −0.1935 0.1992 78.15 78.33
0.4 0.9 37 −0.3961 0.3857 27.38 27.72
0.95 53 −0.3759 0.3983 39.5 39.84
0.6 0.9 18 −0.5835 0.5632 18.41 18.91
0.95 25 −0.5486 0.5999 26.46 26.96
0.8 0.9 11 −0.7681 0.7383 13.69 14.37
0.95 16 −0.6886 0.7754 20.58 21.23
1 0.9 7 −0.9952 0.9644 10.08 10.96
0.95 11 −0.8345 0.9748 16.45 17.24
158 C. Wang et al.

Table 2. The value of sample size n, left precision f1 , right precision f2 and the length
of the interval L with different precision f for the given c = 0.95, 0.9 when ω 2 is
unknown .

f c n f1 f2 L
0.2 0.9 76 −0.117 0.1990 2.77
0.95 207 −0.0702 0.1999 3.89
0.4 0.9 22 −0.2356 0.3956 2.96
0.95 56 −0.1387 0.3981 4.02
0.6 0.9 12 −0.3528 0.5784 3.23
0.95 28 −0.2041 0.5915 4.21
0.8 0.9 8 −0.4950 0.7743 3.59
0.95 18 −0.2671 0.7820 4.45
1 0.9 6 −0.6804 0.9771 4.06
0.95 13 −0.3328 0.9840 4.75

Table 3. The relative frequency for confidence intervals with confidence level c = 0.95,
ω 2 = 1, and location parameter ξ = 0, under different precision f and δ 2 .

f n δ 2 = 0 δ 2 = 0.1 δ 2 = 0.2 δ 2 = 0.5


0.2 201 0.9490 0.9487 0.9467 0.9515
0.4 53 0.9450 0.9514 0.9526 0.9496
0.6 25 0.9513 0.9489 0.9451 0.9506
0.8 16 0.9476 0.9455 0.9486 0.9483
1 11 0.9523 0.949 0.9478 0.9497

equal tail (EL) intervals. Table 2 shows the values of n, f1 and f2 for unknown
ω 2 case, where L is the length of the interval since the length of the shortest
interval is same as that of the one tail case.
Using the Monte Carlo simulations, we account relative frequency for different
valve of δ 2 . Table 3 shows the result for the relative frequency of 95% confidence
intervals for f = 0.2, 0.4, 0.6, 0.8, 1, ω 2 = 1 and δ 2 = 0, 0.1, 0.2, 0.5,
and Table 4 gives that for unknown ω 2 case. All results are illustrated with
a number of simulation runs M = 10000. The next graphs (Fig. 1, 2) show
the density functions and the corresponding histograms for f = 0.4, 0.6 with
ω 2 = 1, respectively, where the brackets are the endpoints of the shortest 95%
confidence intervals and the parentheses are the endpoints for equal tail case.
Figure 3 shows that of F -distribution when ξ = 0 and f = 0.6, where the brackets
are the endpoints of the shortest 95% region.
Desired Sample Size for Estimating the Skewness 159

Table 4. The relative frequency for confidence intervals with confidence level c = 0.95,
and location parameter ξ = 0, under different precision f and δ 2 for unknown ω 2 .

f n δ 2 = 0 δ 2 = 0.1 δ 2 = 0.2 δ 2 = 0.5


0.2 207 0.9553 0.9508 0.9499 0.9489
0.4 56 0.9483 0.9488 0.9524 0.9472
0.6 28 0.9540 0.9482 0.9486 0.9507
0.8 18 0.9501 0.9487 0.946 0.9499
1 13 0.9497 0.9466 0.9479 0.9491

Fig. 1. The density function and histogram of chi-square distribution for ξ = 0, ω 2 = 1


and f = 0.4, where the brackets are the endpoints of the shortest 95% region and the
parentheses are the endpoints of the 95% region for equal tail case.

Fig. 2. The density function and histogram of chi-square distribution for ξ = 0, ω 2 = 1


and f = 0.6, where the brackets are the endpoints of the shortest 95% region and the
parentheses are the endpoints of the 95% region for equal tail case.
160 C. Wang et al.

Fig. 3. The density function and histogram of F-distribution for ξ = 0 and f = 0.6,
where the brackets are the endpoints of the shortest 95% region.

5 An Example with Real Data

We provide an example for illustration of our results in this section. The data
set, which is provided in the appendix, was obtained from a study of leaf area
index (LAI) of robinia pseudoacacia in the Huaiping forest farm of Shannxi
Province from June to October in 2010 (with permission of authors). The esti-
mated distribution based on the data set is SN (1.2585, 1.83322 , 2.4929) by
using the Inferential Models, and SN (1.2585, 1.83322 , 2.7966) via the MME.
For the details, see Zhu et al. [13] and Ye et al. [12]. Now we suppose the pop-
ulation scale parameter ω 2 = 1.83322 and consider the precision f = 0.4 and
confidence level c = 0, 95, then the desired sample size is 53 by Table 1. Ran-
domly choose a sample of size 53, and the sample variance S 2 = 0.6398. Then by
the Remark 3.1, the 95% confidence interval for δ 2 is [0.6984, 0.8631] from which
the 95% confidence interval for the skewness parameter λ can be obtained to be
[1.5217, 2.5109], and for n = 53, the 95% confidence interval for λ under the
equal tail case is [1.5597, 2.5410].
Desired Sample Size for Estimating the Skewness 161

Appendix 1
The data set of the LAI obtained from June to October, 2010

Jun. Jul. Sep. Oct.


4.87 3.32 2.05 1.50
5.00 3.02 2.12 1.46
4.72 3.28 2.24 1.55
5.16 3.63 2.56 1.27
5.11 3.68 2.67 1.26
5.03 3.79 2.61 1.37
5.36 3.68 2.42 1.87
5.17 4.06 2.58 1.75
5.56 4.13 2.56 1.81
4.48 2.92 1.84 1.98
4.55 3.05 1.94 1.89
4.69 3.02 1.95 1.71
2.54 2.78 2.29 1.29
3.09 2.35 1.94 1.34
2.79 2.40 2.20 1.29
3.80 3.28 1.56 1.10
3.61 3.45 1.40 1.04
3.53 2.85 1.36 1.08
2.51 3.05 1.60 0.86
2.41 2.78 1.50 0.70
2.80 2.72 1.88 0.82
3.23 2.64 1.63 1.19
3.46 2.88 1.66 1.24
3.12 3.00 1.62 1.14

References
1. Azzalini, A.: A class of distributions which includes the normal ones. Scand. J.
Stat. 12(2), 171–178 (1985)
2. Azzalini, A., Capitanio, A.: The Skew-Normal and Related Families, vol. 3. Cam-
bridge University Press, Cambridge (2013)
3. Gupta, A.K., Chang, F.C.: Multivariate skew symmetric distributions. Appl. Math.
Lett. 16, 643–646 (2003)
4. Gupta, A.K., Gouzalez, G., Dominguez-Molina, J.A.: A multivariate skew normal
distribution. J. Multivariate Anal. 82, 181–190 (2004)
162 C. Wang et al.

5. Gregoire, T., Affleck, D.: Estimating desired sample size for simple random sam-
pling of a skewed population. Am. Stat. (2017). https://doi.org/10.1080/00031305.
2017.1290548
6. Trafimow, D.: Using the coefficient of confidence to make the philosophical switch
from a posteriori to a priori inferential statistics. Educ. Psychol. Measur. (2016)
7. Trafimow, D.: Using the coefficient of confidence to make the philosophical switch
from a posteriori to a priori inferential statistics. Educ. Psychol. Measur. 77(5),
831–854 (2017)
8. Trafimow, D., MacDonald, J.A.: Performing inferential statistics prior to data col-
lection. Educ. Psychol. Measur. 77(2), 204–219 (2017)
9. Trafimow, D., Wang, T., Wang, C.: From a sampling precision perspective, skew-
ness is a friend and not an enemy! Educ. Psychol. Measur. 1–22 (2018). https://
doi.org/10.1177/0013164418764801
10. Vernic, R.: Multivariate skew-normal distributions with applications in insurance.
Insur. Math. Econ. 38, 413–426 (2006)
11. Wang, T., Li, B., Gupta, A.K.: Distribution of quadratic forms under skew normal
settings. J. Multivariate Anal. 100, 533–545 (2009)
12. Ye, R., Wang, T.: Inferences in linear mixed models with skew-normal random
effects. Acta Mathematica Sinica English Series 31(4), 576–594 (2015)
13. Zhu, X., Ma, Z., Wang, T., Teetranont, T.: Plausibility regions on the skewness
parameter of skew normal distributions based on Inferential Models. In: Robustness
in Econometrics. Studies in Computational Intelligence, vol. 692 (2017). https://
doi.org/10.1007/978-3-319-50742-216
Why the Best Predictive Models
Are Often Different from the Best
Explanatory Models: A Theoretical
Explanation

Songsak Sriboonchitta1 , Luc Longpré2 , Vladik Kreinovich2(B) ,


and Thongchai Dumrongpokaphan3
1
Faculty of Economics, Chiang Mai University, Chiang Mai, Thailand
songsakecon@gmail.com
2
University of Texas at El Paso, El Paso, TX 79968, USA
{longpre,vladik}@utep.edu
3
Department of Mathematics, Faculty of Science, Chiang Mai University,
Chiang Mai, Thailand
tcd43@hotmail.com

Abstract. Traditionally, in statistics, it was implicitly assumed that


models which are the best predictors also have the best explanatory
power. Lately, many examples have been provided that show that the
best predictive models are often different from the best explanatory mod-
els. In this paper, we provide a theoretical explanation for this difference.

1 Formulation of the Problem

Predictive Models vs. Explanatory Models: A Traditional Confusion.


Traditionally, many researchers who have applied statistical methods implicitly
assumed that predictive and explanatory powers are strongly correlated:

• they assumed that a statistical model that leads to accurate predictions also
provides a good explanation for the corresponding phenomenon, and
• they also assumed that models providing a good explanation for the observed
phenomena also lead to accurate predictions.

Predictive Models vs. Explanatory Models: A General Distinction.


In practice, models that lead to good predictions do not always explain the
observed phenomena. Vice versa, models that nicely explain the corresponding
phenomena do not always lead to most accurate predictions.
To illustrate the difference, let us give a simple example from celestial
mechanics; see, e.g., [1]. Newton’s equations provide a very clear explanation
of why and how celestial bodies move, why the planets and satellites follow
their orbits, etc. In principle, we can predict the trajectories of celestial bodies
– and thus, their future observed positions in the sky – by directly integrating
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 163–171, 2019.
https://doi.org/10.1007/978-3-030-04263-9_12
164 S. Sriboonchitta et al.

the corresponding differential equations. This would, however, require a lot of


computation time on modern computers.
On the other hand, people successfully predicted the observed positions of
planets way before Newton: for that, they use epicycles, i.e., in effect, trigonomet-
ric series. Such series are still used in celestial mechanics to predict the positions
of celestial bodies. They are very good for predictions, but they are absolutely
useless in explanations.
Predictive Models vs. Explanatory Models in Statistics. In statistics, the
need to differentiate between predictive and explanatory models was emphasized
and illustrated by Galit Shmueli in [3].
Remaining Problem: Why? The empirical fact that the best predictive mod-
els are often different from the best explanatory models is currently well known
and well recognized.
But from the theoretical viewpoint, this empirical fact still remains a puz-
zle. In this paper, we provide a theoretical explanation for this empirical
phenomenon.

2 Towards Formal (Precise) Definitions: Analysis of the


Problem
Need for Formalization. In order to provide a theoretical explanation for the
difference between the best predictive and the best explanatory models, we need
to first formally describe:
• what it means for a model to be the best predictive model, and
• what it means for a model to be the best explanatory model.
What Does It Mean for a Model to Be Explanatory: Analysis of the
Problem. The “explanatory” part is intuitively understandable: we have some
equations or formulas that explain all the observed data – in the sense that all
the observed data satisfy these equations.
Of course, these equations must be checkable – otherwise, if they are formu-
lated purely in terms of complex abstract mathematics, so that no one knows
how to check whether observed data satisfy these equations or formulas, then
how can we know that the data satisfies them?
Thus, when we say that we have an explanatory model, what we are saying,
in effect, that we have an algorithm – a program if you will – that, given the
data, checks whether the data is consistent with the corresponding equations or
formulas. From this pragmatic viewpoint, by an explanatory model, we simply
means a program.
Of course, this program must be non-trivial: it is not enough for the data
to be simply consistent with the data, explanatory means that we must explain
all this data. For example, if we simply state that, in general, the trade volume
grows when the GDP grows, all the data may be consistent with this rule, but
this consistency is not enough: for a model to be truly explanatory, it needs to
Predictive vs. Explanatory Models 165

explain why in some cases, the growth in trade is small and in other cases, it is
huge. In other words, it must explain the exact growth rate. Of course, this is
economics, not fundamental physics, we cannot explain all the numbers based
on first principles only, we have to take into account some quantities that affect
our processes. But for the model to be truly explanatory we must be sure that,
once the values of these additional quantities are fixed, there should be only one
sequence of numbers that satisfies the corresponding equations or formulas –
namely, the sequence that we observe (ignoring noise, of course).
This is not that different from physics. For example, Newton’s laws of gravi-
tation allow many possible orbits of celestial bodies, but once you fix the masses,
initial conditions, and initial velocities of all these bodies, then Newton’s laws
uniquely determine how these bodies will move.
In algorithmic terms, if:

• to the original program for checking whether the data satisfies the given
equations and/or formulas,
• we add auxiliary parts checking whether the values of additional quantities
are exactly the ones needed to explain the data,
• then the observed data is the only possible sequence of observations that is
consistent with this program.

Once we know such a program that uniquely determines all the data, we can,
in principle, find this data – i.e., solve the corresponding equations – by simply
trying all possible combinations of possible data values until we find the one that
satisfies all the corresponding conditions.
How can we describe this in precise terms? All the observations can be stored
in the computer, and in the computer, everything is stored as 0s and 1s. From
this viewpoint, the whole set of observed data is simply a binary sequence x,
i.e., a finite sequence of 0s and 1s.
The length n of this sequence is known. We know how many binary sequences
there are of each length:

• there are 2 sequences of length 1: 0 and 1;


• there are 2 × 2 = 22 = 4 sequences of length 2:
– two sequences 00 and 01 that start with 0 and
– two sequences 10 and 11 that start with 1;
• there are 22 × 2 = 23 = 8 binary sequences of length 3:
– two sequences 000 and 001 that start with 00,
– two sequences 010 and 011 that start with 01,
– two sequences 100 and 101 that start with 10, and
– two sequences 110 and 111 that start with 11;
• in general, we have 2n sequences of length n.

There are finitely many such sequences, so we must potentially check them
all and thus, find the desired sequence x – the only one that satisfies all the
required conditions.
166 S. Sriboonchitta et al.

Of course, for large n, the time 2n can be unrealistically astronomically large,


so we are talking about potential possibility to compute – not practical compu-
tations: one does not solve Newton’s equations by trying all possible trajectories
and checking whether they satisfy Newton’s equations. But it is OK, since our
goal here is not to provide a practical solution to the problem, but rather to
provide a formal definition of an explanatory model.
For the purpose of this definition, we can associate each explanatory model
not only with the original checking program, but also with the related exhaustive-
search program p that generates the data. The exhaustive search part is easy to
program, it practically does not add to length of the original checking program.
So, we arrive at the following definition.

Definition 1. Let a binary sequence x be given. We will call this sequence


data. By an explanatory model, we mean a program p that generates the binary
sequence x.

Comments
• The above definition, if we read it without the previous motivations part,
sounds very counter-intuitive. However, we hope that the motivation part has
convinced the reader that this strange-sounding definition indeed describes
what we usually mean by an explanatory model.
• For each data, there is at least one explanatory model – since we can always
have a program that simply prints all the bits of the given sequence x one by
one.
What Do We Mean by the Best Explanatory Model: Analysis of the
Problem. There are usually several possible explanatory models, which of them
is the best?
To formalize this intuitive notion, let us again go back to physics. Before
Newton, the motion of celestial bodies was described by epicycles. To accurately
describe the motion of each planet, we needed to know a large number of param-
eters:
• in the first approximation, in which the orbit is a circle, we need to know the
radius of this circle, the planet’s initial position on this circle, and its velocity;
• in the second approximation, we need to know similar parameters of the first
auxiliary circular motion that describes the deviation from the circle;
• in the third approximation, we need to know similar parameters of the
second auxiliary circular motion describing the deviation from the second-
approximation trajectory, etc.
Then came Kepler’s idea that celestial bodies follow elliptical trajectories.
Why was this idea better than epicycles? Because now, to describe the trajectory
of each celestial body, we need fewer parameters: all we need is a few parameters
that describe the corresponding ellipse.
These original parameters formed the main part of the corresponding
data checking program – and thus, of the resulting data generating program.
Predictive vs. Explanatory Models 167

By reducing the number of such parameters, we thus drastically reduced the


length of the checking program – and thus, of the generating program corre-
sponding to the model.
Similarly, what Newton did was replaced all the parameters of the ellipses by
a few parameters describing the bodies themselves – and this described not only
the regular motion of celestial bodies, he also described the tides, he described
(explained) why apples from a tree fall down and how exactly, etc. Here, we also
have fewer parameters needed to explain the observed data – and thus, a much
shorter generating program.
From this viewpoint, a model is better if its generating program is shorter
– and thus, the best explanatory model is the one which is the shortest, i.e.,
the one for which the (bit) length len(p) of the corresponding program p is the
smallest possible. So, we arrive at the following definition.

Definition 2. Let x be the data. We say that an explanatory model p0 for x is


the best explanatory model if it is the shortest of all explanatory models for x,
i.e., if
len(p0 ) = min{len(p) : p generates x}.

What Do We Mean by the Best Predictive Model. Clearly, not all models
which are explanatory models in the sense of Definition 1 can be used for prac-
tical predictions. If using a model requires the astronomical time 2n of billions
of years, then the corresponding program is practically useless:

• if we need thousands of years to predict next year’s position of the Moon, we


do not need this program: we can as well wait a year and see whether the
Moon is;
• similarly, if a trade model takes 10 years of intensive computations to predict
next year’s trade balance, we do not need this program: we can as well wait
a year and see for ourselves.

For a model to be useful for predictions, it needs not just to generate the
data x but to generate them fast – as fast as possible. The corresponding overall
computation time includes both the time needed to upload this program into a
computer – which is proportional to the length len(p) of this program – and the
time t(p) needed to run this program.
From this viewpoint, the smaller this overall time len(p) + t(p), the better.
Thus, the best predictive model is the one for which this overall time is the
smallest possible. So, we arrive at the following definition.

Definition 3. Let x be the data. We say that a model p0 is the best predic-
tive model for x if its overall time len(p0 ) + t(p0 ) is the smallest among all the
explanatory models:

len(p0 ) + t(p0 ) = min{len(p) + t(p) : p generates x}.


168 S. Sriboonchitta et al.

3 Main Result: Formulation and Discussion


Now that we have formal definitions, we can formulate our main result. It comes
as two propositions.

Proposition 1. No algorithm is possible that, given data x, generates the best


explanatory model for this data.

Proposition 2. There exists an algorithm that, given data x, generates the best
predictive model for this data.

Discussion. These two results clearly explain why in many cases, the best pre-
dictive models are different from the best explanatory models:
• if they were always the same, then the algorithm from Proposition 2 would
also always generate the best explanatory models, but
• we know, from Proposition 1, that such a general algorithm is not possible.

4 Proofs
Discussion. It is usually easier to prove that an algorithm exists – all we need to
do is to provide such an algorithm. On the other hand, proving that an algorithm
does not exist is rarely easy: we need to provide some general arguments why
no tricks can lead to such an algorithm.
From this viewpoint, it is easier to prove Proposition 2 than Proposition 1.
Let us therefore start with proving Proposition 2.
Proof of Proposition 2. In line with the above idea, let us describe the cor-
responding algorithm. In this algorithm, to find the program that generates the
given data x in the shortest possible overall time T , we start with T = 1, then
take T = 2, T = 3, etc. – until we find the smallest value T for which such a
program exists.
For each T , we need to look for programs from which len(p) + t(p) = T . For
such programs, we have len(p) ≤ T , so we can simply try all possible binary
sequences p of length not exceeding T . There are finitely many strings of each
length, so there are finitely many strings p of length len(p) ≤ T , and we can try
them all.
For each of these strings, we first use a compiler to check whether this string
is indeed a syntactically correct program. If it is not, we simply dismiss this
string. If the string p is a syntactically correct program, we run it for time
t(p) = T − len(p), to make sure that the overall time is indeed T . If after this
time, the program p generates the desired sequence x, this means that we have
found the desired best predictive model, so we can stop – the fact that we did
not stop our procedure earlier, when we tested smaller values of the overall time
means that no program can generate x in overall time < T and thus, the overall
time T is indeed the smallest possible.
The proposition is proven.
Predictive vs. Explanatory Models 169

Comment. An attentive reader has probably noticed that this algorithm is an


exhaustive-search-type algorithm, that requires exponential time 2n . Yes, this
algorithm is not practical – but practicality is not our goal. Our goal is to explain
the difference between the best predictive and the best explanatory model, and
from the viewpoint of this goal, this slow algorithm serves its purpose: it shows
that:

• the best predictive models can be computed by some algorithm, while,


• as will now prove, the best explanatory models cannot be computed by any
algorithm – even by a very slow one.

Proof of Proposition 1. The main idea behind this proof comes from the fact
that the quantity
def
K(x) = min{len(p) : p generates x}
is well known in theoretical computer science: it was invented by the famous
statistician A. N. Kolmorogov and it is thus known as Kolmogorov complexity;
see, e.g., [2]. One of the first results that Kolmogorov proved about his new
notion is that no algorithm is possible that, given a binary sting x, would always
compute its Kolmogorov complexity K(x) [2].
This immediately implies our Proposition 1: indeed, if it was possible to
produce, for each data x, the best explanatory model p0 , then we would be
able to compute its length len(p0 ) which is exactly K(x) – and K(x) is not
computable.
The proposition is proven.
Discussion. It is worth mentioning that the notion of Kolmogorov complexity
was originally introduced for a somewhat related but still completely different
purpose – how to separate random from non-random sequences.
In the traditional statistics, the very idea that some individual sequences are
random and some are not was taboo, one could only talk about probabilities
of different sequences. However, intuitively, everyone understands that while a
sequence of bits generated by flipping a coin many times is random, a sequence
like 010101...01 in which 01 is repeated million times is clearly not random. How
can we formally explain this intuitive difference?
Kolmogorov notices that a sequence 0101...01 is not random because it can
be generated by a very short program: just repeat 01 many times. For example,
in Java, this program looks like this:

for(i = 1; i < 1000000; i++)


{System.out.println("01");}

On the other hand, if a sequence is truly random, there is no dependency between


different bits, so the only way to print this sequence is to literally print the whole
sequence bit by bit:

System.out.println("01...");
170 S. Sriboonchitta et al.

So, when x is not random, we can have short programs generating x. Thus, the
shortest possible length K(x) of a program generating x is much smaller than
the length len(x) of this sequence:

K(x)  len(x).

On the other hand, for a truly random sequence x, you cannot generate it
by a program shorter than the above line whose length is ≈ len(x). So, in this
case,
K(x) ≈ len(x).
This idea inspired Kolmogorov to define what we now call Kolmogorov com-
plexity K(x) and to define a binary sequence random if K(x) ≥ len(x) − c0 , for
some appropriate constant c0 .
Proof that Kolmogorov Complexity is Not Computable: Reminder.
In our proof of Proposition 1 we used Kolmogorov’s proof that Kolmogorov
complexity K(x) is not computable. To make our result more intuitive, it is
worth mentioning that proof is reasonably intuitive.
The main idea behind this proof comes from the following Barry’s paradox.
Some English expressions describe numbers. For example:
• “twelve” means 12,
• “million” means 1000000, and
• “the smallest prime number larger than 100” means 101.
There are finitely many words in the English language, so there are finitely
many combinations of less than twenty words, thus finitely many numbers which
can be described by such combinations. Hence, there are numbers which cannot
be described by such combinations. Let n0 denote the smallest of such numbers.
Therefore, n0 is “the smallest number that cannot be describe in fewer than
twenty words”. But this description of the number n0 consists of 12 words –
less than 20, so n0 can be described by using fewer than twenty words – a clear
paradox.
This paradox is caused by the imprecision of natural language, but if we
replace “described” by “computed”, we get a proof by contradiction that Kol-
mogorov complexity is not computable.
Indeed, let us assume that K(x) is computable, and let L be the length of
the program that computes K(x). Binary sequences can be interpreted as binary
integers, so we can talk about the smallest of them. Then, the following program
computes the smallest sequence x for which K(x) ≥ 3L: we try all possible binary
sequences of length 1, length 2, etc., until we find the first sequence for which
K(x) ≥ 3L:
int x = 0;
while(K(x) < 3 * L){x++;}
This program adds just two short lines to the length-L program for computing
K(x); thus, its length is ≈ L  3L, so for the number x0 that it computes, its
Predictive vs. Explanatory Models 171

Kolmogorov complexity – the length of the shortest program generating x0 –


cannot exceed this length. Thus, we have K(x)  3L.
On the other hand, we defined x0 as the smallest number for which K(x) ≥
3L, so we have K(x0 ) ≥ 3L – a contradiction. This contradiction shows that our
assumption is wrong, and the Kolmogorov complexity is not computable.

Acknowledgments. This work was supported by the Center of Excellence in Econo-


metrics, Faculty of Economics, Chiang Mai University, Thailand. We also acknowledge
the partial support of Department of Mathematics, Chiang Mai University, and of the
US National Science Foundation via grant HRD-1242122 (Cyber-ShARE Center of
Excellence).
The authors are greatly thankful to Professors Hung T. Nguyen and Galit Shmueli
for valuable discussions.

References
1. Feynman, R., Leighton, R., Sands, M.: The Feynman Lectures on Physics. Addison
Wesley, Boston (2005)
2. Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Appli-
cations. Springer, Berlin (2008)
3. Shmueli, G.: To explain or to predict? Stat. Sci. 25(3), 289–310 (2010)
Algorithmic Need for Subcopulas

Thach Ngoc Nguyen1 , Olga Kosheleva2 , Vladik Kreinovich2(B) ,


and Hoang Phuong Nguyen3
1
Banking University of Ho Chi Minh City,
56 Hoang Dieu 2, Quan Thu Duc, Thu Duc, Ho Chi Minh City, Vietnam
Thachnn@buh.edu.vn
2
University of Texas at El Paso, 500 W. University, El Paso, TX 79968, USA
{olgak,vladik}@utep.edu
3
Division Informatics, Math-Informatics Faculty, Thang Long University,
Nghiem Xuan Yem Road, Hoang Mai District, Hanoi, Vietnam
nhphuong2008@gmail.com

Abstract. One of the efficient ways to describe the dependence between


random variables is by describing the corresponding copula. For contin-
uous distributions, the copula is uniquely determined by the correspond-
ing distribution. However, when the distributions are not continuous,
the copula is no longer unique, what is unique is a subcopula, a function
C(u, v) that has values only for some pairs (u, v). From the purely math-
ematical viewpoint, it may seem like subcopulas are not needed, since
every subcopula can be extended to a copula. In this paper, we prove,
however, that from the algorithmic viewpoint, it is, in general, not pos-
sible to always generate a copula. Thus, from the algorithmic viewpoint,
subcopulas are needed.

1 Formulation of the Problem

Copulas: A Brief Reminder. There are many ways to describe a probability


distribution of a random variable:

• we can use its probability density function (pdf),


• we can use its moments,
• its cumulative distribution function (cdf), etc.
Most of these types of descriptions are not always applicable:

• for a discrete distribution, pdf is not defined,


• for a distribution with heavy tails, moments are sometimes infinite, etc.

Out of the known representations, the representation as a cdf is the most uni-
versal, it does not seem to have limitations. In view of this, to take into account
that in econometrics, one can encounter discrete distributions (for which no pdf
is known), heavy-tailed distributions (for which moments are infinite), etc., it is
reasonable to use a cdf
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 172–181, 2019.
https://doi.org/10.1007/978-3-030-04263-9_13
Algorithmic Need for Subcopulas 173

FX (x) = Prob(X ≤ x)
to describe a random variable X.
Similarly, to describe a joint distribution of two random variables (X, Y ), it
is reasonable to use a joint cdf

FXY (x, y) = Prob(X ≤ x & Y ≤ y).

When random variables X and Y are independent, we have FXY (x, y) =


FX (x) · FY (y). In general, the dependence may be more complicated. It is rea-
sonable to describe this dependence by a function C(u, v) for which

FXY (x, y) = C(FX (x), FY (y)). (1)

A function with this property is known as a copula; see, e.g., [3,6–8]. Copulas have
been successful used in many application areas, in particular, in econometrics.
Existence and Uniqueness of Copulas. It has been proven that such a
copula always exists, and that the copula function C(u, v) is itself a 2-D cdf on
the square
[0, 1] × [0, 1].
In situations when the distributions of X and Y are continuous – e.g., when
there exists pdf’s – the copula is uniquely determined. Indeed, in this case, the
value FX (x) continuously depends on x and thus, attains all possible values
between 0 and 1. So, to find C(u, v), it is sufficient to find the values x and y for
which FX (x) = u and FY (y) = v, then FXY (x, y) will give is the desired value
of C(u, v).
However, if the distribution of one of the variables – e.g., X – is discrete (for
example, there are some values which have positive probabilities), then the value
FX (x) jumps, and for thus, for some intermediate values u, we do not have values
x for which FX (x) = u. In such situations, the copula is not uniquely determined,
since we can have different values C(u, v) for this jumped-over u.
Subcopulas: Reminder. While the copula is not always unique, there is a vari-
ant of this notion which is always unique; this variant is known as a subcopula. In
precise terms, a subcopula is also defined by the formula (1), the only difference
is that:

• while a copula has to be defined for all possible values u ∈ [0, 1] and v ∈ [0, 1],
• a subcopula C(u, v) is only defined for the values u and v which have the
form u = FX (x) and v = FY (y) for some x and y.

Subcopulas have also been successfully used in econometrics; see, e.g., [9,11,13,
14,16,17].
Main Question: Do We Need Subcopulas? From the purely mathematical
viewpoint, it may seem that do not need subcopulas, since every subcopula can
be, in principle, extended to a copula.
174 T. N. Nguyen et al.

However, the fact that many researchers use subcopulas seem to indicate
that, from the algorithmic viewpoint, subcopulas may not be easy to extend
to copulas. An indirect argument in support of this not-easiness is that known
extension proofs use non-constructive arguments such as Zorn’s Lemma (which
is equivalent to a non-constructive Axiom of Choice); see, e.g., [2].
What We Do in This Paper. In this paper, we prove that indeed, in situations
of non-uniqueness, it is not algorithmically possible to always construct a copula.
In other words, we prove that, from the algorithmic viewpoint, subcopulas are
indeed needed.

2 What Is Computable: A Brief Reminder


What is Computable: Main Definitions. In order to analyze when a copula
is computable and when it is not, let us recall the main definitions of computabil-
ity; for details, see, e.g., [1,4,15] (for random variables, see also [5]).
A real number x is computable if we can compute it with any given accuracy.
In other words, a number is computable if there exists an algorithm that, given
an integer n (describing the accuracy), returns a rational number rn for which

|x − rn | ≤ 2−n .

Intuitively, a function f (x) is computable if there is an algorithm that, given


x, returns the value f (x). In precise terms, this means that for any desired
accuracy n, we can compute a rational number rn for which |f (x) − rn | ≤ 2−n ;
in this computation, the program can pick some integer m and ask for an 2−m -
approximation to the input.
Similarly, a function f (x, y) of two variables is called computable if, given
x and y, it can compute the value f (x, y) with any given accuracy. Again, in
the process of computations, this program can pick some m and ask for a 2−m -
approximation to x and to y.
Comment. These definitions describe the usual understanding of computabil-
ity; so, not surprisingly, all usual computable functions – e.g., all elementary
functions, all continuous functions – are computable in this sense as well.
What is Not Computable. What is not computable in this sense are discon-
tinuous functions such as sign(x) which is equal:
• to −1 when x < 0,
• to 0 when x = 0, and
• to 1 when x > 0.
Indeed, if this function was computable, then we would be able to check whether
a computable real number is equal to 0 or not, and such checking is not algo-
rithmically possible; see, e.g., [4] and references therein.
Indeed, the possibility of such checking contradicts to the known result that
it is not possible,
Algorithmic Need for Subcopulas 175

• given a program,
• to check whether this program halts or not.
Indeed, based on each program, we can form a sequence rn each element of which
is:
• equal to 2−n if the program did not yet halt by time n and
• equal to 2−t if it halted at time t ≤ n.
Then:
• If the program does not halt, this sequence describes the computable real
number
x = 0.
• If the program halts at time t, this sequence describes the computable real
number
x = 2−t > 0.
Thus, if we could check whether a real number is equal to 0 or not, we would
be able to check whether a program halts or not – and we know that this is not
algorithmically possible.
What Does It Mean for the cdf to be Computable. In real life, when we
say that we have a random variable, it means that we have a potentially infinite
sequence of observations which follow the corresponding distribution. Based on
these observations, for each computable real number x, we would like to compute
the value F (x).
The value F (x) is the probability that the value of a random variable X is
≤ x. A natural practical way to estimate a probability based on a finite sample
is to estimate the frequency of the corresponding event. Thus, to estimate F (x),
a natural idea is to take n observations X1 , . . . , Xn , find out how many of them
are ≤ x, and then compute the desired frequency by dividing the result of the
counting by n.
Even in the ideal case, when all the values Xi are measured exactly, the
frequency is, in general, different from the probability. It is known (see, e.g.,
[12]) that for large n, the difference between the frequency f and the probability
p is approximately normally distributed, with 0 mean and standard deviation

p · (1 − p) 0.5
σ= ≤√ .
n n
From the practical viewpoint, any deviation larger than 6 sigma has a probability
of less than 10−8 and is, thus, usually considered practically impossible. (If you
do not view 6 sigma as impossible, take 20 sigma; one can always come up with a
probability so small that it is practically impossible.) Thus, if for a given ε > 0,
0.5
we select n so large that 6σ ≤ 6 · √ ≤ ε Then, the resulting frequency f is
n
guarantee to be ε-close to the desired probability F (x): |f − F (x)| ≤ ε, i.e.,
equivalently,
F (x) − ε ≤ f ≤ F (x) + ε.
176 T. N. Nguyen et al.

In practice, we also need to take into account that the values Xi can only
be measured with a certain accuracy δ; see, e.g., [10]. Thus, what we compare
i of their
with the given number x are not the actual values Xi but the results X
measurement which are δ-close to Xi :
i ≤ x, we cannot conclude that Xi ≤ x, we can only conclude that
• If X
Xi ≤ x + δ.
i > x, we cannot conclude that Xi > x, we can only conclude
• Similarly, if X
that
Xi > x − δ.

Thus, the only think that we can guarantee for the observed frequency f is that

F (x − δ) − ε ≤ f ≤ F (x + δ) + ε. (2)

This is how a computable cdf is defined: that, given every a computable number
x and rational numbers ε > 0 and δ > 0, we can efficiently find a rational number
f that satisfies the inequality (2).
A similar inequality

F (x − δ, y − δ) − ε ≤ f ≤ F (x + δ, y + δ) + ε (3)

defines a computable 2-D cdf.


Comment. Note that a cdf can be discontinuous – e.g., if we have a random
variable that is equal to 0 with probability 1, then:

• F (x) = 0 for x < 0 and


• F (x) = 1 for x ≥ 0.

We already know such a function cannot be computable, so a computable cdf is


not necessarily a computable function.
However, as we will see, when the computable cdf is continuous, it is a com-
putable function.

Proposition 1. When the distributions are continuous, a computable cdf is a


computable function.

Proof. Indeed, the inequalities (2) can be rewritten as follows:

fδ,ε (x − δ) − ε ≤ F (x) ≤ fδ,ε (x + δ), (4)


where fδ,ε (x) means a frequency estimated by comparing the measured values
Xi (measured with accuracy δ) with the value x, based on a sample large enough
to guarantee the accuracy ε.

Also, due to (2), we have

F (x − 2δ) − ε ≤ fδ,ε (x − δ) and fδ,ε (x + δ) ≤ F (x + 2δ) + ε,


Algorithmic Need for Subcopulas 177

thus

F (x − 2δ) − 2ε ≤ fδ,ε (x − δ) − ε ≤ F (x) ≤ fδ,ε (x + δ) + ε ≤ F (x + 2δ) + 2ε. (5)

When the cdf F (x) is a continuous function, then, for each x, the difference

F (x + 2δ) − F (x − 2δ)

tends to 0 as δ decreases. Thus, the difference between the values F (x + 2δ) + 2ε


and F (x−2δ)−2ε also tends to 0 as δ → 0 and ε → 0. So, if we take δ = ε = 2−k
for k = 1, 2, . . ., we will eventually encounter an integer k for which this difference
is smaller than a given number 2−n . In this case, due to (5), the difference
between the inner bounds fδ,ε (x + δ) + ε and fδ,ε (x − δ) − ε is also ≤ 2−n . In
this case, each of these bounds can be used as the desired 2−n -approximation to
F (x).
Thus, to compute F (x) with accuracy 2−n , it is sufficient to compute, for
k = 1, 2, . . .,

• values ε = δ = 2−k and then


• values fδ,ε (x + δ) + ε and fδ,ε (x − δ) − ε.

We continue these computations for larger and larger k until the difference
between fδ,ε (x + δ) + ε and fδ,ε (x − δ) − ε becomes smaller than or equal to 2−n .
Once this condition is satisfied, we return fδ,ε (x + δ) + ε as the desired 2−n -
approximation to F (x).
The proposition is proven.
Comment. In the 2-D case, we can use a similar proof.

3 Main Results and Their Proofs

Proposition 2. There exists an algorithm that, given a continuous computable


cdf FXY (x, y), generates the corresponding copula – i.e., generates a computable
cdf C(u, v) that satisfies the formula (1).

Proposition 3. No general algorithm is possible that, given a computable cdf


FXY (x, y), would generate the corresponding copula – i.e., that would generates
a computable cdf C(u, v) that satisfies the formula (1).

Comment. This result proves that from the algorithmic viewpoint, it is, in gen-
eral, not possible to always generate a copula. Thus, from the algorithmic view-
point, subcopulas are indeed needed.
Proof of Proposition 2. This proof is reasonably straightforward, it follows
the above idea of finding the copula for a continuous cdf.
178 T. N. Nguyen et al.

Indeed:
• suppose that we are given two computable numbers u, v ∈ [0, 1], and
• we want to find the desired approximation to the value C(u, v).
To do that, we first find x for which FX (x) is δ-close to u.
This value can be found as follows. First, we pick any x0 and compute FX (x0 )
with accuracy δ; we can do it, since, according to Proposition 1, for continuous
distributions, the cdf is a computable function. If we get a value which is δ-close
to u, we are done.
If the approximate value FX (x0 ) is larger than u, we take x0 − 1, x0 − 2, etc.,
until we find a new value x− for which FX (x− ) < u.
Similarly, if the approximate value FX (x0 ) is smaller than u, we take x0 + 1,
x0 + 2, etc., until we find a new value x+ for which FX (x+ ) > u.
In both cases, we have an interval [x− , x+ ] for which F (x− ) < u < F (x+ ).
Now, we can use bisection to find the desired x: namely, we take a midpoint xm
of the interval. Then:
• If |FX (xm ) − u| ≤ δ, we are done.
• If this ideal inequality is not satisfied, then we have:
– either FX (xm ) < u
– or FX (xm ) > u.
• In the first case, we know that the desired value x is in the half-size interval
[xm , x+ ].
• In the second case, we know that the desired value x is in the half-size interval
[x− , xm ].
• In both cases, we get a new half-size interval.
To the new interval, we apply the same procedure until we get the desired x.
Similarly, we can compute y for which FY (y) ≈ v. Now, we can take the
approximation to FXY (x, y) as the desired approximation to C(u, v).
The proposition is proven.
Proof of Proposition 3. For each real number a for which |a| ≤ 0.5, we can
form the following probability distribution Fa (x, y) on the square [0, 1] × [0, 1]:
it is uniformly distributed on a straight line segment y = 0.5 + sign(a) · (x − 0.5)
corresponding to x ∈ [0.5 − |a|, x + |a|]. Thus:
• when a > 0, we take y = x; and
• when a < 0, we take y = 1 − x.
One can easily check that Fa (x, y) is indeed a computable cdf – although it
is not always a computable function, since, e.g., for a = 0 the whole probability
distribution is concentrated at the point (0.5, 0.5).
For each a, the marginal distribution FX (x) is uniformly distributed on the
interval [0.5−|a|, 0.5+|a|] of length 2|a|. Thus, for the values x from this interval,
we have
x − (0.5 − |a|)
FX (x) = .
2|a|
Algorithmic Need for Subcopulas 179

So for any u ∈ [0, 1], to get FX (x) = u, we must take

x = 0.5 − |a| + 2u · |a| = 0.5 − (1 − 2u) · |a|.

Similarly, to get the value y for which FY (y) = v, we should take

y = 0.5 − (1 − 2v) · |a|.

For u, v ≤ 0.5, we get x ≤ 0.5 and y ≤ 0.5.


In particular, for u = v = 0.25, we should take x = y = 0.5 − 0.5|a|, and for
u = v = 0.5, we should take x = y = 0.5.
Since the distribution is symmetric, when u = v, we have the same values
x = y for which FX (x) = u and FY (y) = u.

• When a > 0, we have X = Y . Thus for every u = v, we have

C(u, u) = FXY (x, x) = Prob(X ≤ x & Y ≤ x) = Prob(X ≤ x) = u,

i.e., C(u, u) = u. In particular, we have

C(0.25, 0.25) = 0.25 and C(0.5, 0.5) = 0.5.

• When a < 0, then Y = 1 − X. Thus, when X ≤ 0.5, we have Y ≥ 0.5, so


when u, v ≤ 0.5, we cannot have both X ≤ u and Y ≤ v, and thus, we get

C(u, u) = FXY (x, x) = Prob(X ≤ x & Y ≤ x) = 0.

In particular, we have C(0.25, 0.25) = C(0.5, 0.5) = 0.

If it was possible, given a computable real number a, to compute a computable


cdf C(u, v), then, by definition of a computable cdf, we would be able to compute:

• given a,
• the value fδ,ε (x) corresponding to x = 0.375, δ = 0.125 and ε = 0.1,
def
i.e., the value f = f0.125,0.1 (0.375) for which

C(x − δ, x − δ) − ε ≤ f ≤ C(x + δ, x + δ) + ε,

i.e., for which


C(0.25, 0.25) − 0.1 ≤ f ≤ C(0.5, 0.5) + 0.1.
180 T. N. Nguyen et al.

Here:

• when a < 0, we have C(0.5, 0.5) = 0, hence f ≤ 0.1 and therefore f < 0.125;
• when a > 0, then C(0.25, 0.25) = 0.25, hence f ≥ 0.15 and therefore f > 0.125.

So, by comparing the resulting value f with 0.125, we will able to check whether
a > 0 or a < 0 – and this is known to be algorithmically impossible; see,
e.g., [1,4,15]. This contradiction shows that it is indeed not possible to have an
algorithm that always computes the copula.
The proposition is proven.

Acknowledgments. This work was supported in part by the US National Science


Foundation via grant HRD-1242122 (Cyber-ShARE Center of Excellence).
The authors are thankful to Professor Hung T. Nguyen for valuable discussions.

References
1. Bishop, E., Bridges, D.: Constructive Analysis. Springer Verlag, Heidelberg (1985)
2. Fernández-Sánchez, J., Úbeda-Flores, M.: Proving Sklar’s theorem via Zorn’s
lemma. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 26(1), 81–85 (2018)
3. Jaworski, P., Durante, F., Härdle, W.K., Rychlik, T. (eds.): Copula Theory and
Its Applications. Springer, Heidelberg (2010)
4. Kreinovich, V., Lakeyev, A., Rohn, J., Kahl, P.: Computational Complexity and
Feasibility of Data Processing and Interval Computations. Kluwer, Dordrecht
(1997)
5. Kreinovich, V., Pownuk, A., Kosheleva, O.: Combining interval and probabilistic
uncertainty: what is computable? In: Pardalos, P., Zhigljavsky, A., Zilinskas, J.
(eds.) Advances in Stochastic and Deterministic Global Optimization, pp. 13–32.
Springer, Cham (2016)
6. Mai, J.-F., Scherer, M.: Simulating Copulas: Stochastic Models, Sampling Algo-
rithms, and Applications. World Scientific, Singapore (2017)
7. McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts,
Techniques, and Tools. Princeton University Press, Princeton (2015)
8. Nelsen, R.B.: An Introduction to Copulas. Springer, Heidelberg (2007)
9. Okhrin, O., Okhrin, Y., Schmidt, W.: On the structure and estimation of hierar-
chical Archimedean copulas. J. Econ. 173, 189–204 (2013)
10. Rabinovich, S.G.: Measurement Errors and Uncertainty: Theory and Practice.
Springer, Berlin (2005)
11. Ruppert, M.: Contributions to Static and Time-varying Copula-based Modeling of
Multivariate Association. EUL Verlag, Koeln (2012)
12. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures.
Chapman and Hall/CRC, Boca Raton (2011)
13. Wei, Z., Wang, T., Nguyen, P.A.: Multivariate dependence concepts through cop-
ulas. Int. J. Approx. Reason. 65(1), 24–33 (2015)
14. Wei, Z., Wang, T., Panichkitkosolkul, W.: Dependence and association concepts
through copulas. In: Huynh, V.-N., Kreinovich, V., Sriboonchitta, S. (eds.) Mod-
eling Dependence in Econometrics, pp. 113–126. Springer, Cham (2014)
15. Weihrauch, K.: Computable Analysis. Springer, Berlin (2000)
Algorithmic Need for Subcopulas 181

16. Zhang, Y., Beer, M., Quek, S.T.: Long-term performance assessment and design
of offshore structures. Comput. Struct. 154, 101–115 (2015)
17. Zhu, X., Wang, T., Choy, S.T.B., Autchariyapanitkul, K.: Measures of mutually
complete dependence for discrete random vectors. In: Kreinovich, V., Sriboon-
chitta, S., Chakpitak, N. (eds.) Predictive Econometrics and Big Data, pp. 303–
317. Springer, Cham (2018)
How to Take Expert Uncertainty into
Account: Economic Approach Illustrated
by Pavement Engineering Applications

Edgar Daniel Rodriguez Velasquez1,2 , Carlos M. Chang Albitres2 ,


Thach Ngoc Nguyen3 , Olga Kosheleva4 , and Vladik Kreinovich4(B)
1
Department of Civil Engineering, Universidad de Piura in Peru (UDEP),
Av. Ramón Mugica 131, Piura, Peru
edgar.rodriguez@udep.pe,edrodriguezvelasquez@miners.utep.edu
2
Department of Civil Engineering, University of Texas at El Paso,
500 W. University, El Paso, TX 79968, USA
cchangalbitres2@utep.edu
3
Banking University of Ho Chi Minh City, 56 Hoang Dieu 2, Quan Thu Duc,
Thu Duc, Ho Chi Minh City, Vietnam
Thachnn@buh.edu.vn
4
University of Texas at El Paso, 500 W. University, El Paso, TX 79968, USA
{olgak,vladik}@utep.edu

Abstract. In many application areas, we rely on expert estimates. For


example, in pavement engineering, we often rely on expert graders to
gauge the condition of road segments and to see which repairs are needed.
Expert estimates are imprecise; it is desirable to take the resulting uncer-
tainty into account when making the corresponding decisions. The tra-
ditional approach is to first apply the traditional statistical methods
to get the most accurate estimate and then to take the corresponding
uncertainty into account when estimating the economic consequences
of the resulting decision. On the example of pavement engineering appli-
cations, we show that it is beneficial to apply the economic approach
from the very beginning. The resulting formulas are in good accordance
with the general way how people make decisions in the presence of risk.

1 Formulation of the Problem

Need for Expert Estimates. In many practical situations, we use experts to


help make decisions.
In some cases – e.g., in medicine – we need experts because computer-based
automated system are not yet able to always provide a correct diagnosis: human
medical doctors are still needed.
In other case, the corresponding automatic equipment exists, but it is much
cheaper to use human experts. For example, in pavement engineering, in princi-
ple, we can use automatic systems to gauge the condition of the road surface, to
estimate the size of cracks and other faults, but the corresponding equipment is
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 182–190, 2019.
https://doi.org/10.1007/978-3-030-04263-9_14
Pavement Engineering Applications 183

still reasonably expensive to use, while a human grader can make these evalua-
tions easily. The use of human grades is explicitly mentioned in the corresponding
normative documents; see, e.g., [1] (see also [5]).
Expert Estimates Come with Uncertainty. Expert estimates usually come
with uncertainty. The experts’ estimates have, at best, the accuracy of about
10–15%, up to 20%; see, e.g., [3].
This observed accuracy is in the perfect accordance with the well-known
“seven plus-minus two law” (see, e.g., [4,6]), according to which a person nor-
mally divides everything into seven plus-minus two – i.e., between 5 and 9 –
categories, and thus, has the accuracy between 1/9 ≈ 10% and 1/5 ≈ 20%.
Traditional Approach to Dealing with This Uncertainty. In the tradi-
tional approach to dealing with the expert uncertainty, we:

• first use the traditional statistical techniques to transform the expert opinion
into the most accurate estimate of the desired quantity, and then
• if needed, we gauge the economic consequences of the resulting estimate.

Limitations of the Traditional Approach. The main limitation of the tra-


ditional approach is that while our ultimate objective is economic – how to best
maintain the pavement within the given budget – we do not take this objective
into account when transforming expert’s opinion into a numerical estimate.
What We Do in This Paper. In this paper, we show how to take economic
factors into account when producing the estimate. The resulting formulas are in
line with the usual way how decision makers take risk into account.

2 Traditional Approach to Transforming Expert Opinion


into a Numerical Estimate: A Brief Reminder

Main Idea. An expert may describe his or her opinion in terms of a word from
natural language, or by providing a numerical estimate. For each such opinion –
be it a word or a numerical estimate – we can find all the cases when this expert
expressed this particular opinion, and in all these cases, find the actual value of
the estimated quantity q.
As a result, for each opinion, we get a probability distribution on the set of all
possible values of the corresponding quantity. This distribution can be described
either in terms of the corresponding probability density function (pdf) ρ(x), or
def
in the terms of the cumulative distribution function (cdf) F (x) = Prob(q ≤ x).
In many real-life situations, the expert uncertainty is a joint effect of many
different independent factors, each of which may be small by itself. In such
cases, we can take into account the Central Limit Theorem, according to which
the distribution of the sum of a large number of small independent random
variables is close to Gaussian (normal); see, e.g., [7]. Thus, it often makes sense
184 E. D. Rodriguez Velasquez et al.

to assume that the corresponding probability distribution is normal. For the


normal distribution with mean μ and standard deviation σ, we have
 
x−μ
F (x) = F0 ,
σ

where F0 (x) is the cdf of the standard normal distribution – with mean 0 and
standard deviation 1.
Based on the probability distribution, we describe the most accurate numer-
ical estimate.
Details: How to Transform the Probability Distribution Reflecting the
Expert Opinion into a Numerical Estimate. We want to have an estimate
which is as close to the actual values of the quantity q as possible.
For the same opinion of an expert, we have, in general, different actual val-
ues q1 , . . . , qn . These values form a point (q1 , . . . , qn ) in the corresponding n-
dimensional space. Once we select a numerical value x0 corresponding to this
opinion, we will generate the value x0 in all the cases in which the experts has this
particular opinion. In other words, what we generate is the point (x0 , . . . , x0 ).
A natural idea is to select the estimate x0 for which the point (x0 , . . . , x0 )
is the closest to the point (q1 , . . . , qn ) that describes the actual values of the
corresponding quantity. In other words, we want to select the estimate x0 for
which the distance
def 
d = (x0 − q1 )2 + . . . + (x0 − qn )2

is the smallest possible.


Minimizing the distance is equivalent to minimizing its square

d2 = (x0 − q1 )2 + . . . + (x0 − qn )2 .

Differentiating the expression for d2 with respect to x0 and equating the deriva-
tive to 0, we conclude that

2(x0 − q1 ) + . . . + 2(x0 − qn ) = 0.

If we divide both sides of this equality by 2, move all the terms not related to
x0 to the right-hand side, and then divide both sides by n, we conclude that

x0 = μ,

where μ denotes the sample mean:


def q1 + . . . + qn
μ = .
n
In terms of the probability
 distribution, this is equivalent to minimizing
 the
mean square value (x − x0 )2 · ρ(x) dx, which leads to x0 = μ = x · ρ(x) dx.
Pavement Engineering Applications 185

3 How to Estimate the Economic Consequences of


Selecting an Estimate: On the Example of Pavement
Engineering
Analysis of the Problem: Possible Faults and How Much it Costs to
Repair Them. In pavement engineering, we are interested in estimating the
pavement fault index x. When the pavement is perfect, this index is 0. The
presence of any specific fault increases the value of this index.
Repairing a fault takes money; the larger the index, the more costly it is to
repair this road segment. Let us denote the cost of repairs for a road segment
with index x by c(x).
We are interested in the case when the road is regularly repaired. In this
case, the index x cannot grow too much – once there are some faults in the road,
these faults are being repaired. Thus, the values of the index x remain small. So,
we can expand the unknown function into Taylor series and keep only the first
terms in this expansion – e.g., only linear terms:

c(x) ≈ c0 + c1 · x.

When the road segment is perfect, i.e., when x = 0, no repairs are needed,
so the cost is 0: c(0) = 0. Thus, c0 = 0, and the cost of repairs linearly depends
on the index:
c(x) ≈ c1 · x. (1)

What is the Cost of Not repairing a Road Segment? If we do not repair a


faulty road segment, then, because of the constant traffic load, in the next year,
the pavement condition will become worse.
Each fault worsens. Thus, the more faults we have now, the worse will be the
situation next year. Let g(x) denote the next-year index corresponding to the
situation when this year, the index is x.
Since, as we have mentioned, it makes sense to consider small values of x, we
can safely expand the function g(x) in Taylor series and keep only linear terms
in this expansion:
g(x) ≈ g0 + g1 · x.
When the pavement is perfect, i.e., when x = 0, we usually do not expect it to
deteriorate next year, so we should have g(0) = 0. Thus, we have g0 = 0, and
g(x) ≈ g1 · x.
Since we did not repair the road segment this year, we have to repair it next
year. Next year, the index will increase from the original value x to the new
def
value x = g1 · x. Thus, the cost of repairs will be c1 · x = c1 · g1 · x.
This is the cost next year, so to compare it with the cost of this-year repairs,
we need to take into account that next year’s money is somewhat cheaper than
this year’s money: if the interest rate is r, we can invest a smaller amount
c1 · q1 · x
(2)
1+r
186 E. D. Rodriguez Velasquez et al.

now, and get the desired amount c1 · g1 · x next year. This formula (2) describes
the equivalent this-year cost of not repairing the road segment this year.
Combining These Costs: What is the Economic Consequence of
Selecting an Estimate. Once we select an estimate x0 describing the qual-
ity of the road segment, we perform the repairs corresponding to this degree.
According to the formula (1), these repairs costs us the amount c1 · x0 .
If the actual value x is exactly equal to x0 , this is the ideal situation: the
road segment is repaired, and we spend exactly the amount of the money needed
to repair it. Realistically, the actual x is, in general, somewhat different from x0 .
As a result, we waste some resources.
When the actual value x of the pavement quality is smaller than x0 , this
means that we spend too much money on repairs: e.g., we bring on heavy and
expensive equipment while a simple device would have been sufficient. We could
spend just c1 ·x and instead, we spend a larger amount c1 ·x0 . Thus, in comparison
with the ideal situation, we waste the amount

c1 · x0 − c1 · x = c1 · (x0 − x). (3)

When the actual value x of the pavement index is larger than the estimate
x0 , this means that after performing the repairs corresponding to the value x0 ,
we still have the remaining fault level x − x0 which needs to be repaired next
year. The cost of these repairs – when translated into this year’s costs – can be
found by applying the formula (2): it is

c1 · q1 · (x − x0 )
. (4)
1+r
The formulas (3) and (4) describe what will be the wasted amount for each x.
By multiplying this amount by ρ(x) and integrating over x, we get the following
expression for the expected value of the waste:
 x0  ∞
def c1 · q1 · (x − x0 )
W (x0 ) = c1 · (x0 − x) · ρ(x) dx + · ρ(x) dx. (5)
0 x0 1+r

4 Towards Economically Optimal Estimates


Main Idea. Instead of selecting the statistically optimal estimate

x0 = μ = x · ρ(x) dx

and gauging the expected waste related to this estimate, let us instead use the
estimate that minimizes the waste (5).
Analysis of the Problem. To find the value x0 that minimizes the expres-
sion (5), let us differentiate this expression with respect to x0 and equate the
derivative to 0.
Pavement Engineering Applications 187

The expression (5) is the sum of two terms, so the derivative of the expression
(5) is equal to sum of the derivatives of these two terms.
To find the derivative of the first term, it is convenient to introduce an
auxiliary function
 t
def
G(t, x0 ) = c1 · (x0 − x) · ρ(x) dx. (6)
0

In terms of this auxiliary function, the first term has the form G(x0 , x0 ). Thus,
by the chain rule, the derivative of the first term can be described as

d ∂G(t, x0 ) ∂G(t, x0 )
G(x0 , x0 ) = + . (7)
dx0 ∂t |t=x0 ∂x0 |t=x0

It is known that differentiation and integration are inverse operations. Since


(6) is an integral of some expression from 0 to t, its derivative with respect to t
is simply the value of the integrated expression for x = t:

∂G(t, x0 )
= c1 · (x0 − t) · ρ(t).
∂t
For t = x0 , this expression is equal to 0.
The second expression in the right-hand side of the formula (7) is an integral.
The derivative of the integral (i.e., in effect, of the weighted sum) is thus equal
to the integral (i.e., to the weighted sum) of the corresponding derivatives:
 t  t
∂G(t, x0 ) ∂ ∂
= c1 · (x0 − x) · ρ(x) dx = (c1 · (x0 − x) · ρ(x)) dx. (8)
∂x0 ∂x0 0 0 ∂x 0

The derivative of a linear function is simply the coefficient at the unknown x0 :



(c1 · (x0 − x) · ρ(x)) = c1 · ρ(x),
∂x0
thus the expression (8) takes the form
 t  t
∂G(t, x0 )
= c1 · ρ(x) dx = c1 · ρ(x) dx.
∂x0 0 0
The integral in the right-hand side of this formula is simply the value of the cdf
F (t). So, for t = x0 , it takes the form F (x0 ). Thus:

d ∂G(t, x0 )
G(x0 , x0 ) = = c1 · F (x0 ). (9)
x0 ∂x0 |t=x0

To find the derivative of the second term in the right-hand side of the formula
(5), let us introduce another auxiliary function
 ∞
def c1 · q1 · (x − x0 )
H(t, x0 ) = · ρ(x) dx. (10)
t 1+r
188 E. D. Rodriguez Velasquez et al.

In terms of this auxiliary function, the second term in the expression (5) for
the waste function W (x0 ) has the form H(x0 , x0 ). Thus, by the chain rule, the
derivative of the first term can be described as
d ∂H(t, x0 ) ∂H(t, x0 )
H(x0 , x0 ) = + . (11)
dx0 ∂t |t=x0 ∂x0 |t=x0

Since (10) is an integral of some expression from t to ∞, its derivative with


respect to t is simply minus the integrated expression:

∂H(t, x0 ) c1 · q1 · (t − x0 )
=− · ρ(t).
∂t 1+r
For t = x0 , this expression is equal to 0.
For the second term in the right-hand side of the formula (11), the derivative
of the integral (i.e., in effect, of the weighted sum) is equal to the integral (i.e.,
to the weighted sum) of the corresponding derivatives:
 ∞
∂H(t, x0 ) ∂ c1 · q1 · (x − x0 )
= · ρ(x) dx =
∂x0 ∂x0 t 1+r
 ∞  
∂ c1 · q1 · (x − x0 )
· ρ(x) dx. (12)
t ∂x0 1+r

The derivative of a linear function is simply the coefficient at the unknown x0 :


 
∂ c1 · q1 · (x − x0 ) c1 · q1
· ρ(x) = · ρ(x),
∂x0 1+r 1+r

thus the expression (12) takes the form


 ∞  ∞
∂H(t, x0 ) c1 · q1 c1 · q1
= · ρ(x) dx = · ρ(x) dx.
∂x0 t 1+r 1+r t

The integral in the right-hand side of this formula is simply 1 minus value of the
cdf F (t). So for t = x0 , it takes the form 1 − F (x0 ). Thus, this derivative takes
the following form:

∂H(t, x0 ) c1 · q1
= · (1 − F (x0 )). (13)
∂t |t=x0 1+r

As we have mentioned, the derivative of the objective function (5) – the derivative
which should be equal to 0 when we select the economically optimal estimate
x0 – is equal to the sum of the expressions (9) and (13). Thus, the optimality
dW (x0 )
condition = 0 takes the form
dx0
c1 · q1
c1 · F (x0 ) − · (1 − F (x0 )) = 0.
1+r
Pavement Engineering Applications 189

If we divide both sides of this equality by c1 , move all the terms not containing
the unknown F (x0 ) to the right-hand side, and divide by the coefficient at F (x0 ),
we conclude that
q1
1 +r = q1
F (x0 ) = q1 .
1+ 1 + r + q1
1+r

Main Conclusion. As the estimate corresponding to the expert’s opinion, we


should select not the mean of the actual values corresponding to this opinion,
q1
but rather a quantile corresponding to the level :
1 + r + q1
q1
F (x0 ) = , (14)
1 + r + q1
where:

• q1 is the growth rate of the pavement fault – what fault of index 1 will grow
into next year, and
• r is the interest rate – how much interest we will get if we invest $1 now.

Discussion. If the fault growth is negligible, i.e., if q1 ≈ 1, then, taking into


account that r is very small, we conclude that F (x0 ) ≈ 1/2, i.e., x0 should be
the median of the corresponding probability distribution.
For symmetric distributions like normal, median and mean coincide – they
both coincide with the center of the distribution, i.e., with the value with respect
to which this distribution is symmetric. In this case, we can still use the statis-
tically optimal estimate x0 = μ.
1+r
However, in most real-life situations, when q1  1 + r, we have  1,
q1
1+r
thus, 1 +  2 and
q1
q1 1
F (x0 ) = =  0.5,
1 + r + q1 1+r
1+
q1
so we should select the values larger than the mean.  
x−μ
For the case of the normal distribution, with F (x) = F0 , the for-
σ
mula (14) takes the form
 
x−μ q1
F0 = ,
σ 1 + r + q1

i.e., the form


x−μ
= k, (15)
σ
190 E. D. Rodriguez Velasquez et al.

where k is the value for which


q1
F0 (k) = . (16)
1 + r + q1
Thus, instead of the statistically optimal estimate x0 = μ, we need to use the
estimate
x0 = μ + k · σ. (17)
This is in line with the usual way to taking risk into account when comparing
different alternatives: instead of comparing average gains μ, we should compare
the values μ − k · σ, where the coefficient k depends on the person’s tolerance to
risk; see, e.g., [2] and references therein.
Comment. We recommend plus k · σ, since instead of maximizing gains, we
minimize losses – i.e., negative gains. When we switch from the value to negative
of this value, then μ + k · σ becomes μ − k · σ. Indeed, μ[−x] = −μ[x], while
σ[−x] = σ[x], so
μ[−x] + k · σ[−x] = −μ[x] + k · σ[x] = −(μ[x] − k · σ[x]).

Practical Recommendation for Pavement Engineering (and for Other


Similar Applications). For each expert opinion, we collect all the cases in
which the expert expressed this opinion, and find, in all these cases, the actual
values of the corresponding quantity. Based on these actual values, we compute
the mean μ and the standard deviation σ. Then, as a numerical description of
the expert’s opinion, we select the value μ + k · σ, where k is determined by the
formula (16).
This way, we can decrease the losses caused by the expert’s uncertainty.

Acknowledgments. This work was supported in part by the US National Science


Foundation grant HRD-1242122 (Cyber-ShARE Center).

References
1. ASTM International: Standard Practice for Roads and Parking Lots Pavement Con-
dition Index Surveys, International Standard D6433-18
2. Elton, E.J., Gruber, M.J., Brown, S.J., Goetzman, W.N.: Modern Portfolio Theory
and Investment Analysis. Wiley, New York (2014)
3. Metropolitan Transportation Commission (MTC): MTC Rater Certification Exam,
Streetsaver Academy, San Francisco, California (2018)
4. Miller, G.A.: The magical number seven, plus or minus two: some limits on our
capacity for processing information. Psychol. Rev. 63(2), 81–97 (1956)
5. Park, K., Thomas, N.E., Lee, K.W.: Applicability of the international roughness
index as a predictor of asphalt pavement condition. J. Transp. Eng. 133(12), 706–
709 (2007)
6. Reed, S.K.: Cognition: Theories and Application. Wadsworth Cengage Learning,
Belmont, California (2010)
7. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures.
Chapman and Hall/CRC, Boca Raton (2011)
Quantum Approach Explains the Need
for Expert Knowledge: On the Example
of Econometrics

Songsak Sriboonchitta1 , Hung T. Nguyen1,2 , Olga Kosheleva3 ,


Vladik Kreinovich3(B) , and Thach Ngoc Nguyen4
1
Faculty of Economics, Chiang Mai University, Chiang Mai, Thailand
songsakecon@gmail.com, hunguyen@nmsu.edu
2
Department of Mathematical Sciences, New Mexico State University, Las Cruces
88003, New Mexico, Thailand
3
University of Texas at El Paso, 500 W. University, El Paso, TX 79968, USA
{olgak,vladik}@utep.edu
4
Banking University of Ho Chi Minh City, 56 Hoang Dieu 2, Quan Thu Duc, Thu
Duc, Ho Chi Minh City, Vietnam
Thachnn@buh.edu.vn

Abstract. The main purposes of econometrics are: to describe economic


phenomena, and to find out how to regulate these phenomena to get the
best possible results. There have been many successes in both purposes.
Companies and countries actively use econometric models in making eco-
nomic decisions. However, in spite of all the successes of econometrics,
most economically important decisions are not based only on the econo-
metric models – they also take into account expert opinions, and it has
been shown that these opinions often drastically improve the resulting
decisions. Experts – and not econometricians – are still largely in charge
of the world economics. Similarly, in many other areas of human activi-
ties, ranging from sports to city planning to teaching, in spite of all the
successes of mathematical models, experts are still irreplaceable. But
why? In this paper, we explain this phenomenon by taking into account
that many complex systems are well described by quantum equations,
and in quantum physics, the best computational results are obtained
when we allow the system to make kind of imprecise queries – the types
that experts ask.

1 Formulation of the Problem

Why Aren’t We in Charge of the World Economics? Since Newton’s


equations have been discovered, computing a trajectory of a celestial body or of
a spaceship became a purely computational problem.
There was a similar hope when the first equations were discovered for describ-
ing economic phenomena:

c Springer Nature Switzerland AG 2019


V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 191–199, 2019.
https://doi.org/10.1007/978-3-030-04263-9_15
192 S. Sriboonchitta et al.

• that mathematical methods would enable us to predict and control economic


behavior as we control spaceships,
• that eventually, all the economic problems will be resolved by appropriate
computations,
• that eventually, econometricians – researchers who know how to solve the
corresponding systems of equations, how to optimize the desired objective
function – will be largely in charge of the world economics.
Since then, econometrics has experienced a lot of success stories, but, in
spite of all these success stories, we are still not in charge: who is in charge are
experts, CEOs, fund managers, bankers, people who may know some mathemat-
ical models, but whose main strength is in their expertise – not in knowing these
models.
Why? Why are econometricians not in charge of companies – after all, com-
panies are interested in maximizing their profits, so why not let a specialist in
maximization be in charge? The fact that this is not happening en masse shows
that, in spite of all the successes of econometrics, there is a still a big advantage
in using expert knowledge.
But why? We do not have an expert with an intuitive understanding of
trajectories in charge of computing spaceship trajectories, why is it different in
economics?

Experts Are Needed Not Only in Economics. In many others areas of


human activity, there is also a surprising need for experts.
For example, in sports, a few decades ago, new sports mathematical methods
were developed that drastically improved our understanding of sports phenom-
ena and led to many team successes; see, e.g., [14]. At first, the impression was
that the corresponding formulas provide a much better way of selecting team
players than the experience of even most experience coaches. However, it soon
turned out that relying only on the mathematical models is not a very effective
strategy, that much better results can be obtained if we combine the mathemat-
ical model with the expert’s opinions; see, e.g., [9]. But why?
Same thing with smart cities. Cities often grow rather chaotically, with unin-
tended negative consequences of different decisions, so:
• why not have a computer-based system combining all city services,
• why not optimize the functioning of the city while taking everyone’s interests
into account?
This seems to be a win-win proposition. This was the original idea behind smart
cities. This idea indeed led to many improvements and successes – but it also
turned out that by themselves, the resulting mathematical models do not always
provide us with very good results. Much better results can be obtained if we take
expert knowledge into account; see, e.g., [19].
Yet another area where experts are still (surprisingly) needed is teaching.
Every time there is a new development in teaching technology, optimistic popu-
lar articles predict that these technologies, optimized by using appropriate math-
ematical models, will eventually replace human teachers. And they don’t.
Quantum Approach Explains the Need for Expert Knowledge 193

• This was predicted when videotaped lectures appeared.


• this was predicted with current MOOCs – massive open online courses.
And these predictions turn out to be wrong. Definitely, teachers adopt new
technologies, these new technologies make teaching more efficient – but attempts
to eliminate teachers completely and let an automatic system teach have not yet
been successful.
Same with medical doctors: since the very first medicine-oriented expert sys-
tem MYCIN appeared several decades ago (see, e.g., [3]), enthusiasts have been
predicting that eventually, medical doctors will be replaced by expert systems.
Definitely, these systems help medical doctors and thus, improve the quality of
the health care, but still medical experts are very much in need.
Similar examples can be found in many other areas of human activity. But
why are experts so much needed? Why cannot we incorporate their knowledge
into automated systems that would thus replace these experts?
Why Cannot We Just Translate Expert Knowledge into Computer-
Understandable Terms: Approaches Like Fuzzy Logic Helped, But
Experts Are Still Needed. Many researchers recognized the desirabil-
ity to translate imprecise natural-language expert knowledge into computer-
understandable terms. Historically the first successful idea of such a translation
was formulated by Lotfi Zadeh under the name of fuzzy logic [23]. This tech-
niques has indeed led to many successful applications; see, e.g., [2,11,15,16,18];
however, in spite of all these successes, experts are still needed. Why?
What We do in this Paper. In this paper, we show that this unexpected need
for expert knowledge can be explained if we take into account that many complex
systems – especially systems related to econometrics and, more generally, with
human behavior – are well described by quantum equations [6,22], equations that
were originally invented to describe micro-objects of the physical world. And the
experience of designing computers that take quantum effects into account has
shows, somewhat unexpectedly, that the best results are attained if instead of
asking precise questions, we ask kind of imprecise ones – we will explain this in
detail in the following sections.

2 Quantum Equations and Quantum Computing: Brief


Reminder

Quantum Equations Are Helpful in Econometrics. Somewhat surpris-


ingly, quantum equations – originally developed for studying small physical
objects – have been shown to be useful in describing economic phenomena and,
more generally, any phenomena that involves human decision making; see, e.g.,
[1,10,13].

Let us Therefore Look for Experience of Quantum-Related Decisions.


In view of the above usefulness, when thinking of the best algorithms for mak-
ing decisions in economics, to look for how decisions are made – and how the
corresponding computations are performed – in the quantum world.
194 S. Sriboonchitta et al.

The Main Idea of Quantum Computing: A Brief Reminder. To perform


more and more computations, we need to perform computations faster and faster.
In nature, there is a limitation on the speed of all possible physical processes:
according to modern physics, all the speeds are bounded by the speed of light –
c ≈ 300 000 km/sec. This may sound like a lot, but take into account that for a
typical laptop size of 30 cm, the smallest possible time that any signal need to
go from one side of the laptop to another is 30 cm divided by c, which is about 1
nanosecond, i.e., 10−9 seconds. During this nanosecond, a usual several-gigahertz
processor – and gigaherz means 109 operations per second – performs several
arithmetic operations. Thus, to make it even faster, we need to make processors
even smaller. To fit billions of cells of memory in a small-size computer requires
decreasing these cells to the size at which the size of each cell is of almost the
same order as the size of a molecule – and thus, quantum effects, i.e., physical
effects controlling micro-world, need to be taken into account.
The need to take quantum effects into account when computing was first
emphasized by the Nobelist Richard Feynman in his 1982 paper [5]. At first,
quantum effects were mainly treated as nuisance. Indeed, one of the features of
quantum physics is its probabilistic nature:
• many phenomena cannot be exactly predicted,
• we can only predict the probabilities of different outcomes,
• and the probability that a computer will not do what we want makes the
computations less reliable.
However, later, it turns out that it is possible, as the saying goes, to make tasty
lemonade out of the sour and not-very-edible-by-themselves lemons that life gives
us: namely, it turned out that by cleverly arranging the corresponding quantum
effects, we can actually speed up computations – and speed them up drastically.

The Main Successes of Quantum Computing: A Brief Overview. The


first result showing potential benefits of quantum computing was an algorithm
developed by Deutsch and Josza ten year after Feynman’s paper; see [4] (see also
[12] for a pedagogical description of this algorithm). This algorithm solved the
following simple-sounding problem:
• given a function f (x) that transforms one bit (0 or 1) into one but,
• check whether this function is constant, i.e., whether f (0) = f (1).
This may sound like a simple problem not worth spending time on, but it
is actually a simple case of a very important practical problem related to high
performance computing. In many applications, we have developed software that
solves the corresponding system of partial differential equations: it takes as input
the initial conditions, the boundary conditions, and produces the results. Solving
such systems of equations often requires a lot of computation time; for example:
• accurately predicting tomorrow’s weather requires several hours on the fastest
modern high performance computer, and
Quantum Approach Explains the Need for Expert Knowledge 195

• reasonably accurately predicting where the trajectory of a tornado will go in


the next 15 min takes even longer than several hours – thus making current
predictions practically useless.
One possible way of speeding up computation is based on the fact that:
• while we include all the inputs into our parameters,
• some of current input’s bits do not actually affect our results.
This is, by the way, one of the skills that physicists have – in situations like this,
figuring out which inputs are important and which can be safely ignored. But
even after utilizing all the physicists’ expertise, we probably have many bits of
data that do not affect tomorrow’s weather, i.e., for which, whether we put it
bit value 1 or bit value 0 into the corresponding computations, we will get the
exact same result:
f (. . . , 1, . . .) = f (. . . , 0, . . .).
Now we see that the original Deutsch-Josza problem is indeed the simplest
case of an important practical problem – important when computing the above
simple function f (x) takes a lot of computation time. If we operate within clas-
sical physics, then we have to plug in either 0 or 1 into the given “black box” for
computing f (x). If we only plug in 0 and not 1, we will know f (0) but not f (1)
– and thus, we will not be able to know whether the values f (0) and f (1) are
the same. To check whether the given function f (x) is a constant, we therefore
need to call the function f (x) two times.
An interesting result of Deutsch and Josza is that in quantum computing,
we can find the answer by using only one call to the function f (x) – in the next
section, we will explain how this is possible and how this is related to the need
for expert knowledge.
This result opened the floodgates for many other efficient quantum algo-
rithms. One of the first was Grover’s algorithm for a fast search in an unsorted
array [7,8]. The search problem is becoming more and more important every
day, with the increasing amount of data coming in. Ideally, we should sort all
this data – e.g., in alphabetic order – and thus make it easier to search, but in
practice, we often have no time for such sorting, and thus, store the data in a
non-sorted order, in memory cells

c1 , c2 , . . . , cn .

Suppose now that we want to find a record r in this database. For example,
suppose that an act of terror has happened, the surveillance system recorded
the faces of penetrators, and to help stop further attacks, we want to find if
these faces have appeared in any of the previously recorded surveillance video
recordings.
A natural way to find the desired record is to look at all n stored records one
by one until we find a one. In this process, if we look at fewer than n records,
we may thus miss the record ci containing the desired information. Thus, in
the worst case, to find the desired record, we must spend time c · n = O(n),
196 S. Sriboonchitta et al.

where c is the average time needed to look into a single record – and within
classical physics, no faster algorithm is possible. Interestingly, Grover’s quantum

algorithm searches for the record much faster – in time proportional to n.
There are many other known effective quantum algorithms. The most well
known is Shor’s fast factorization algorithm [20,21] that enables us to factorize
large integers fast. This sounds like an academic problem until one realizes that
most computer encryption that we use – utilizing the so-called RSA algorithm
– is based on the difficulty of factorizing large integers. So, if Shor’s algorithm
becomes practical, we will be able to read all the encrypted messages that have
been sent so far – this is why governments and companies all over the world try
to implement this algorithm.
Comments.

• Shor’s result would not mean, by the way, that with the implementation
of quantum computing, encryption will be impossible – researchers have
invented unbreakable quantum encryption algorithm which are, by the way,
already used to convey important messages. This algorithm and many other
quantum computing algorithms can be found in [17].
• In the following section, we will briefly mention how exactly quantum com-
puters achieve their speedup – and how this is related to the need for experts.

3 How Quantum Computers Achieve Their Speedup and


How This Explains the Need for Imprecise Expert
Knowledge

Superposition: A Specific Feature of Quantum World. One of the impor-


tant specific features of the quantum world is that in addition to classical (non-
quantum) states, we can have linear combinations (called superpositions) of these
states. This is a very non-intuitive notion, this is one of the reasons why Ein-
stein was objecting to quantum physics: for example, how can one imagine a
superposition of a live cat and a dead cat? Intuitive or not, quantum physics has
been experimentally confirmed – while many more intuitive alternative theories
ended up being rejected by the experiments.
Let us thus illustrate this idea on the example of quantum states of a bit (a
quantum bit is also called a qubit). In non-quantum physics, a bit has two states:
0 and 1. In quantum physics, these states are usually denoted by |0 and |1.
In quantum physics, in addition to the two classical states |0 and |1, we
also allow superpositions, i.e., states of the type c0 · |0 + c1 · |1, where c0 and
c1 are complex numbers. The meaning of this state is that when we read the
contents of this bit – i.e., if we try to measure whether we will get 0 or 1:

• we will get 0 with probability |c0 |2 , and


• we will get 1 with probability |c1 |2 .
Quantum Approach Explains the Need for Expert Knowledge 197

Since we will always find either 0 or 1, these two probabilities must add up to 1:
|c0 |2 + |c1 |2 = 1. This is the condition under which the above superposition is
physically possible.
How Superpositions Are Used in Deutsch-Josza Algorithm. In the quan-
tum world, superpositions are “first-class citizens” in the sense that:
• whatever one can do with classical states, we can do with superpositions as
well.
In particular:
• just like we can use 0 and 1 as inputs to the algorithm f (x),
• we can also use a superposition as the corresponding input.
And this is exactly the main trick behind the Deutsch-Josza algorithm: that
instead of using the classical state (0 or 1) as an input, we use, as the input, a
superposition state
1 1
√ · |0 + √ · |1,
2 2
a state in which we can get 0 or 1 with equal probability
 
 1 2
√  = 1.
 2 2

How Superpositions Are Used in Grover’s Algorithm. In the non-


quantum approach, all we can do is select an index i and ask the system to
check whether the i-th record contains the desired information. In contrast, in
quantum mechanics, in addition to submitting an integer i as an input to the
database, we can also submit a superposition of different indices:
c1 · |1 + c2 · |2 + . . . + ci · |i + . . . + cn · |n,
as long as this superposition is physically meaningful, i.e., as long as all the
corresponding probabilities add up to 1:
|c1 |2 + |c2 |2 + . . . + |ci |2 + . . . + |cn |2 = 1.
This is exactly how Grover’s algorithm achieves its speedup – by having such
superpositions as queries.

Comment. A similar idea underlies Shor’s fast factorization algorithm. Namely,


√ way to factorize a large number N is to try all possible prime factors
a usual
p ≤ N . In Shor’s algorithm, crudely speaking, instead of inputting a single
prime number p into the corresponding divisibility-checking algorithm, we input
an appropriate superposition of the states |p corresponding to different prime
numbers.

How All this Implies the Need for Experts. How can we interpret a
superposition input in commonsense terms? For example, in the search-in-the-
database problem:
198 S. Sriboonchitta et al.

• A traditional query would be to select an index i and to check whether the


i-th record contains the desired information.
• In quantum computing, we do not select a specific index i, the query may
affect several different indices with different probabilities.

This is exactly the same effect as when an expert asks something like “is one of
the earlier records containing the desired information” – meaning maybe record
No. 1, maybe record No. 2, etc. Of course, the result of this query is also prob-
abilistic (imprecise): we do not get the exact answer to this question, we get an
imprecise answer – which would correspond to something like “possibly”.
In other words, queries like the ones asked by quantum algorithms are very
similar to imprecise queries that real experts make. The main lesson of quantum
computing is thus that:

• normally, when we start with such imprecise queries, we try to make them
more precise (“precisiate” them, to use Zadeh’s term from fuzzy logic), while
• quantum computing shows that in many important cases, it is computation-
ally more beneficial to ask such imprecise queries than to ask precise ones.

In other words, quantum computing proves that combining precise computations


with imprecise expert-type reasoning is often beneficial – which explains the
somewhat surprising empirical need for such expert reasoning.

Acknowledgments. This work was supported by the Center of Excellence in Econo-


metrics, Faculty of Economics, Chiang Mai University, Thailand. We also acknowledge
the partial support of the US National Science Foundation via grant HRD-1242122
(Cyber-ShARE Center of Excellence).

References
1. Baaquie, B.E.: Quantum Finance: Path Integrals and Hamiltonians for Options
and Interest Rates. Camridge University Press, New York (2004)
2. Belohlavek, R., Dauben, J.W., Klir, G.J.: Fuzzy Logic and Mathematics: A His-
torical Perspective. Oxford University Press, New York (2017)
3. Buchanan, B.G., Shortliffe, E.H.: Rule Based Expert Systems: The MYCIN Exper-
iments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading
(1984)
4. Deutsch, D., Jozsa, R.: Rapid solutions of problems by quantum computation.
Proc. R. Soc. Lond. A 439, 553–558 (1992)
5. Feynman, R.P.: Simulating physics with computers. Int. J. Theor. Phys. 21(6/7),
467–488 (1982)
6. Feynman, R., Leighton, R., Sands, M.: The Feynman Lectures on Physics. Addison
Wesley, Boston (2005)
7. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Pro-
ceedings of the 28th ACM Symposium on Theory of Computing, pp. 212–219
(1996)
8. Grover, L.K.: Quantum mechanics helps in searching for a needle in a haystack.
Phys. Rev. Lett. 79(2), 325–328 (1997)
Quantum Approach Explains the Need for Expert Knowledge 199

9. Grover, T.S., Wenk, S.L.: Relentless: From Good to Great to Unstoppable. Scrib-
ner, New York (2014)
10. Haven, E., Khrennikov, A.: Quantum Social Science. Cambridge University Press,
Cambridge (2013)
11. Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River
(1995)
12. Kosheleva, O., Kreinovich, V.: How to introduce technical details of quantum com-
puting in a theory of computation class: using the basic case of the Deutsch-Jozsa
Algorithm. Int. J. Comput. Optim. 3(1), 83–91 (2016)
13. Kreinovich, V., Nguyen, H.T., Sriboonchitta, S.: Quantum ideas in economics
beyond quantum econometrics. In: Anh, L., Dong, L., kreinovich, V., Thach,
N. (eds.) Econometrics for Financial Applications, pp. 146–151. Springer, Cham
(2018)
14. Lewis, M.: Moneyball: The Art of Winning an Unfair Game. W. W. Norton, New
York (2004)
15. Mendel, J.M.: Uncertain Rule-Based Fuzzy Systems: Introduction and New Direc-
tions. Springer, Cham (2017)
16. Nguyen, H.T., Walker, E.A.: A First Course in Fuzzy Logic. Chapman and
Hall/CRC, Boca Raton (2006)
17. Nielsen, M., Chuang, I.: Quantum Computation and Quantum Information. Cam-
bridge University Press, Cambridge (2000)
18. Novák, V., Perfilieva, I., Močkoř, J.: Mathematical Principles of Fuzzy Logic.
Kluwer, Boston (1999)
19. Schehtner, K.: Bridging the adoption gap for smart city technologies: an interview
with Rob Kitchin. IEEE Pervas. Comput. 16(2), 72–75 (2017)
20. Shor, P.: Polynomial-time algorithms for prime factorization and discrete loga-
rithms on a quantum computer. In: Proceedings of the 35th Annual Symposium
on Foundations of Computer Science, Santa Fe, New Mexico, 20–22 November 1994
(1994)
21. Shor, P.: Polynomial-time algorithms for prime factorization and discrete loga-
rithms on a quantum computer. SIAM J. Sci. Statist. Comput. 26, 1484–1509
(1997)
22. Thorne, K.S., Blandford, R.D.: Modern Classical Physics: Optics, Fluids, Plasmas,
Elasticity, Relativity, and Statistical Physics. Princeton University Press, Princeton
(2017)
23. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
Applications
Monetary Policy Shocks
and Macroeconomic Variables: Evidence
from Thailand

Popkarn Arwatchanakarn(B)

School of Management, Mae Fah Luang University, Chiang Rai, Thailand


popkarn.arw@mfu.ac.th

Abstract. From May, 2000 up to the present day, Thailand has imple-
mented a monetary policy of inflation targeting, with its central bank
(Bank of Thailand) using a short-term interest rate as the main mone-
tary instrument. A question arises as to whether the short-term policy
interest rate remains effective as the monetary policy instrument due to
the current uncertainty of global economy
Using the structural vector error correction (SVEC) model with con-
temporaneous and long-run restrictions, this paper has employed quar-
terly data for Thailand over the inflation targeting period of 2000q2–
2017q2 to investigate the relationship among monetary policy shocks
and some key macroeconomic variables in Thailand under the operation
of the inflation targeting. This study finds significant feedback relations
among the six variables in the specified SVEC model, namely real out-
put, prices, interest rates, monetary aggregates, exchange rates and trade
balance. It also suggests that the effects of monetary policy on macroeco-
nomic variables in Thailand are mostly consistent with theoretical expec-
tations. The overall results provide support to an argument that price
stability is required for sustained economic growth. More importantly,
the policy interest rate remains valid and effective as the monetary instru-
ment for price stability under inflation targeting

Keywords: Inflation targeting · Monetary policy · SVEC model


Thailand · Trade balance

1 Introduction
The role of price stability and monetary policy independence in Thailand has
increased since the East Asia crisis of 1997–1998. Given an institutional reform
of monetary policy, a managed-float exchange rate system and a rule-based mon-
etary policy has been operated. From May 2000 up to the present day, Thailand
has implemented a flexible form of inflation targeting framework, with the price
stability as the ultimate objective of monetary policy.
Under the inflation targeting, understanding the directions and magnitude of
the influences that drive the monetary transmission mechanism of an economy is
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 203–219, 2019.
https://doi.org/10.1007/978-3-030-04263-9_16
204 P. Arwatchanakarn

key to the successful conduct of monetary policy for price stability. The impact
of monetary policy will be substantial if the transmission mechanism of mone-
tary policy is completely passed-through to the target sections of the economy
especially the price level and economic growth. In other words, an instrument of
monetary policy will be more effective when the monetary transmission mech-
anism is the well-understood and well-developed [3]. On the other hand, when
either a monetary policy is not credible or monetary transmission channels do
not work effectively, inflation targeting cannot anchor inflation expectation and
therefore it fails to achieve price stability. In the case of Thailand, the Bank of
Thailand (BOT) has used a short-term interest rate as the main policy instru-
ment to control inflation and adjust aggregate demand. The basic question arises
as to whether the effects of a monetary policy shock on macroeconomic variables
are as theoretical expectation. Another question is whether the policy interest
rate remains valid and effective as the monetary policy instrument in a low-
inflation environment and with global financial turbulence or not.
The purpose of this study is to investigate the effects of monetary shocks
on some key macroeconomic variables in Thailand under the implementation
of inflation targeting. We establish a structural vector error correction (SVEC)
model and impose the long run neutrality of money and contemporaneous restric-
tions to identify the monetary policy shock. The hypothesis that the short-term
policy interest rate and money remain valid transmission mechanisms of Thai
monetary policy is tested. In addition, the effects of monetary policy on key
macroeconomic variables are examined. The analysis of the impulse response
functions and forecast error variance decompositions are then made to draw the
empirical findings and policy implications.
The structure of this study is organised as follows. Section 2 reviews the
literature on the conduct of monetary policy and structural vector error cor-
rection (SVEC) models. Section 3 outlines an SVEC model for investigating the
dynamic interactions among real output, prices, interest rates, monetary aggre-
gate, exchange rates and trade variables in the presence of two exogenous vari-
ables, namely the world prices of oil and the US Federal Fund rates. Section 4
reports the empirical results, including the impulse response functions and fore-
cast error variance decompositions. Finally, Sect. 5 summarises the key findings
and draws policy implication.

2 Literature Review

2.1 Background of Thailand Monetary Policy

Since the East Asian crisis of 1997–1998, the role of monetary policy for price
stability in Thailand has increased. As part of the implementation of the Inter-
national Monetary Fund’s (IMF) stabilisation programme in 1997, Thailand has
operated a managed floating exchange rate system with some episodic capital
controls. This has therefore given the bank of Thailand (BOT) some indepen-
dence in the conduct of monetary policy for price stability.
Monetary Policy Shocks and Macroeconomic Variables 205

For an institutionalisation of monetary policy, Thailand’s monetary policy


operated under monetary targeting during July 1997–May 2000. A monetary
aggregate was used as the instrument of monetary policy, and it acted as an
anchor for prices. Although the BOT’s operation under monetary targeting was
successful in stabilising the price level, the concern about reducing economic
growth and increasing the unemployment rate arose.
Since May 2000, the BOT has been conducting flexible inflation targeting
(IT) with the price stability as the ultimate objective of monetary policy. In
turn, the IT policy has replaced the independent monetary policy. In addition,
it has been more relevant in achieving price stability and more effective on infla-
tion expectation anchoring than other monetary policies. The reason behind the
abandonment of monetary targeting was the alleged instability of the demand
for money function and the claim that it had loosened the relationship among
money, output and prices [13]. Operating inflation targeting, the BOT discarded
monetary aggregate and has employed a short-term policy interest rate as the
main monetary instrument to adjust aggregate demand in order to maintain
both the price and output stability.
Consequently, inflation targeting has been instrumental to elevate the institu-
tional reforms and the effectiveness and credibility of central banks. The optimal
inflation targeting would elevate the conduct of monetary policy leading to the
price stability, which is a basis of sustained economic growth. Under the Infla-
tion targeting, the Bank of Thailand has successfully maintained inflation within
its target range and lowered vulnerability to external shocks. Overall, the Thai
economy has a satisfactory performance in the terms of economic growth and
inflation [10].
Although inflation targeting has been successfully implemented in Thailand,
the problem associated with the interest-based transmission mechanism is sug-
gested [10,22]. An abandonment of monetary aggregates in the conduct of mon-
etary policy is not without cost to an economy [15]. In particular, the policy
interest rate has effectively made monetary aggregates redundant, despite the
fact that monetary aggregates have been important in the inflationary process
of developing countries1 . It is considered unwise to ignore monetary aggregates
in the conduct of monetary policy because there is a dynamic relation among
the real output, prices, interest rates, monetary aggregate and exchange rates
[2,13]. One consequences of the operation of inflation targeting in Thailand is
the creation of money-growth volatility, which has kept inflation volatile and, in
turn, made both the real interest rates and real exchange rates volatile [13]. In
addition, an increase in inflation raises the volatility of inflation, which affects
Thailand’s economic growth [14].

1
There is an argument that central banks do not, and cannot, impose control over
the long-term interest rate without controlling the money growth rate. The use of a
short-term interest rate as the monetary instrument makes the money-growth rate
unstable. Unstable money growth makes the inflation and hence the interest rate
unstable [2, 13].
206 P. Arwatchanakarn

2.2 Related Literature

2.2.1 Theoretical Review


Two hypotheses dominate over the relation between monetary policies, exchange
rates and trade balance in an open economy. The first hypothesis is the over-
shooting hypothesis, which involves monetary policies and exchange rates.
According to this hypothesis, a contractionary (expansionary) monetary pol-
icy results in a large initial appreciation (depreciation) of an exchange rate,
followed by subsequent depreciation (appreciation). Empirical evidence on the
exchange rate overshooting has been debatable. Grilli and Roubini [11] found
that a contractionary monetary policy initially produces a gradual appreciation,
which was then followed by a gradual depreciation. Their finding is not consis-
tent with in line of exchange rate overshooting. However, Jang and Ogaki [17]
found evidence to support the overshooting hypothesis.
The second hypothesis is the J-curve hypothesis, which relates between mon-
etary policy and the trade balance. According to the J-curve hypothesis, a real
depreciation (appreciation) of an exchange rate lowers (increases) the relative
price of domestic goods to foreign goods, which, in turn, increases (decreases)
exports and decreases (increases) imports. As a result, the trade balance gets an
improvement (deterioration). The process of J-curve takes time, it is not imme-
diate. Empirical studies that support the J-curve hypothesis include Kim [18]
and Koray and McMillin [20].

2.2.2 Review on SVAR and SVEC Model


There is growing literature on monetary policy analysis by using structural vec-
tor autoregression (SVAR) and structural vector error correction (SVEC) mod-
els. The SVAR model is a standard econometric model in dynamic macroeco-
nomic analysis. It has been extensively used to analyse monetary issues espe-
cially monetary transmission mechanism [6]. However, the existing SVAR studies
have weaknesses that come from some issues, such as non-stationary variables
and short sample spans. These problems might generate unreliable, misleading
results and economic puzzles. Due to some limitations of the SVAR model, a
structural vector error correction (hereafter SVEC) model has recently emerged
as a new analytic tool for investigating the relationship between monetary policy
and macroeconomic variables.
Employment of an SVEC model has remained a challenge and has been con-
troversial in monetary policy analysis. The SVEC model, originally developed by
King, Plosser, Stock and Watson [21], relatively differs from the SVAR model.
Whereas the SVAR requires only short-run (contemporaneous) restrictions, the
SVEC model requires both short-run (contemporaneous) and long-run restric-
tions. As the study of Faust and Leeper [7], they criticise the weakness of model
estimation with only short-run or long-run restrictions. They also recommend
using both short-run and long-run restrictions to improve estimations. As stated
in Fung and Kasumovich [9], it is possible to impose identification schemes on
the cointegration matrix of a VECM. In addition, Jang and Ogaki [17] point
Monetary Policy Shocks and Macroeconomic Variables 207

that the SVEC model has some advantages in systems with stochastic trends
and cointegration. This can be inferred that estimators from the SVEC Model
are more precise that those from the SVAR model.
The SVEC model would be superior to the SVAR model by addressing some
issues. Firstly, it allows the use of cointegration restrictions, which is possible
to impose long-run restrictions in order to identify shocks. Long-run restrictions
are more attractive because they are more directly related to the macroeconomic
model. The cointegration properties of the variables provide restrictions, which
can be taken into account beneficially in identifying the structural shocks. It also
allows the imposition of restrictions for both short- and long-run relationships.
Secondly, it allows us to incorporate with both the I(0) and I(1) nature of data
and empirically supports cointegrating relationships within the same modelling
framework [8,24] That is, an SVEC model requires less restrictions than an
SVAR. Thirdly, it can imposed restrictions about the underlying structure of
the economy and would provide a better fit to the data. Therefore, it is capable
of investigating how the economy works in response to monetary policy shocks
and other shocks.
In Thailand, a number of monetary literature has extensively used VAR and
SVAR to investigate Thailand’s transmission mechanism of monetary policy.
Examples of VAR literature for Thailand include the studies of Chareonseang
and Manakit [3]; Disyatat and Vongsinsirikul [5]; and Hesse [12]. In addition, few
studies utilize the SVAR model for the analysis of Thailand’s monetary policy,
for instance Arwatchanakarn [1]; Hossain and Arwatchanakarn [15]; Kubo [22]
and Phiromswad [26]. However, the employment of an SVEC model is somewhat
limited and remains challenging. The empirical studies on monetary policy issue
using an SVEC model include Arwatchanakarn [2] and Chucherd [4].
It would be beneficial to employ the SVEC model for analysing monetary
policy issues in Thailand. Therefore, the present study aims to establish an SVEC
mode to analyse the relationship between the monetary policy shocks and some
key macroeconomic variables for Thailand. To accomplish this, both plausible
short-run and long-run restrictions, which are based on economic theory and
previous studies, are imposed on the specified SVEC model.

3 Methodology
3.1 Interrelations Among the Interest Rate, Monetary Aggregate,
Real Output, Price Level, Exchange Rate and Trade Balance

Thailand is a small open economy, operating under a managed-float exchange


rate system with some episodic controls over capital flows. The bank of Thai-
land has currently used a short-term policy interest rate, as the instrument of
monetary policy, to stabilise the price levels and the economy without affecting
economic growth. Following the existing literature on monetary policy under
interest-based inflation targeting, an SVEC modelling is considered useful to
analyse the monetary transmission mechanism and mechanics of an economy,
208 P. Arwatchanakarn

namely real output (Y), prices (P), monetary aggregates (M), the interest rates
(PR), the exchange rates (ER) and trade balance (TB).
Accordingly, a six-variable SVEC model is established to investigate the inter-
actions among all above defined variables under an exogeneity assumption that
Thailand remains exposed to shocks originating from two external variables,
namely world oil prices and foreign interest rates.2 For modelling purposes, two
external variables are assumed to affect domestic variables, but they are not
affected by domestic variables. Figure 1 illustrates the interrelations among the
external and domestic variables that can be expected for Thailand.

Fig. 1. The 6-variables SVEC model: The interrelation among the interest rates, mone-
tary aggregates, prices, real output, exchange rates and trade balance with two external
shocks, e.g. the world price of oil and the foreign interest rate.

3.2 An SVEC Model Specification and Identification of Monetary


Policy Shocks
This section briefly describes an overview of a structural vector error correc-
tion (SVEC) model. Following Lütkepohl [23], a reduced form of vector error
correction (VEC) model can be specified. There are benefits in utilising the
cointegrating properties of the variables. Therefore, it is interesting and useful
to deploy an SVEC model in analysis of monetary policy
Considering a reduced from VAR model

A(L)yt = c + et (1)

where A(L) is a matrix polynomial in the lag operator L, yt is an n×1 dimension


vector of variables, c is an n × 1 vector ofconstants (intercepts), e t is white noise
with zero mean and covariance matrix e .

2
The variables in an SVEC model represent a multivariate system of endogenous vari-
ables, which maintain dynamically feedback relations. For identification purposes,
some restrictions are imposed on the long-run and the short-run (or contemporane-
ous) relations among variables in the SVEC model.
Monetary Policy Shocks and Macroeconomic Variables 209

Assuming that all variables are at most I(1), the VEC model is represented
with cointegration rank of the following form:

Δyt = αβ yt−1 + Π1 Δyt−1 + ... + Πp−1 Δyt−p+1 + et (2)

In the case of cointegration, the matrix αβ is of reduced rank (r < n) and the

αβ yt−1 represents for error correction term. The dimension of α and β matrices
are (n×r) and r is the cointegration rank. More specifically, the α and β contain
the coefficients and the cointegration vectors respectively. The Πi s express n × n
reduced form of short-run matrices.
The structural form of Eq. (2) is given by

HΔyt = Γ yt−1 + Φ1 Δyt−1 ... + Φp−1 Δyt−p+1 + ut (3)

where H represents the contemporaneous coefficients matrix, the Φi s are struc-


tural form of short-run coefficient matrices, the ut is structural innovations.
In the SVEC model, the reduced form disturbances et are linearly related to
the structural innovations ut . The contemporaneous (or short run: SR) matrix
is given such that
et = H −1 ut (4)
Suppose that the process yt is influenced by two types of structural distur-
bances: permanent impact and transitory impact. To gain the information of the
process yt , vector moving average (VMA) representation is used in the Eq. (3).
t

yt = η(1) ei + η(L)et + y0 (5)
t=1

Notice that yt contains the initial condition (y0 ), transitory shocks (η(L))
and permanent shocks  (η(1)). The transitory shock (η(L)) is the infinite sum-

mation of term η(L) = j=0 ηj Lj that converges to zero. This suggests that the
transitory shock does not have long-run impact. The η(1) expresses the long-run
impact of permanent shocks which is given such that
  p−1
  −1
 
η(1) = β⊥ α⊥ In − Πi β⊥ α⊥ (6)
t−1

Substituting ei by H −1 ui , the Eq. (4) can be rewritten such that


t
 t

η(1) ei = η(1)H −1 ui (7)
t−1 t−1

The transitory shocks can be identified by replacing zero restriction on


H −1 matrix. These restrictions imply that some shocks do not have contem-
poraneous effect on some variables in the system.
This study deploys the SVEC model with long-run and short-run (or con-
temporaneous) restrictions. Having identified the restriction, the six endogenous
210 P. Arwatchanakarn

variables are ordered as follows: the policy interest rate (PR); monetary aggre-
gate (M); prices (P); real output (Y); exchange rates (ER) and trade balance
(TB). This study follows the identification scheme of Ivrendi and Guloglu [16]
and Kim and Roubini [19]. Our SVEC model views that a monetary policy shock
is transitory while the other shocks are permanent3 .
For local just-identified SVEC model [23,25], it requires a total of 15
(n(n − 1)/2) restrictions. The long-run and contemporaneous restrictions are
represented in two following matrices. Apart from the cointegration structure,
there are r transitory shocks. It provides r(n − r) = 5 restrictions on the long-
run matrix as the permanent shocks. The policy interest rate is assumed to be
transitory, and this variable undertake the adjustment required for the cointe-
grating relationship to hold.
The transitory shock is described by a zero column in the long-run matrix4
(LR) as shown in Eq. (8) below:
⎡ ⎤
0∗∗∗∗∗
⎢0 ∗ ∗ ∗ ∗ ∗⎥
⎢ ⎥
⎢0 ∗ ∗ ∗ ∗ ∗⎥
−1
LR = η(1)H = ⎢ ⎢ ⎥ (8)

⎢0 ∗ ∗ ∗ ∗ ∗⎥
⎣0 ∗ ∗ ∗ ∗ ∗⎦
0∗∗∗∗∗

Assuming that cointegrating rank (r) is equal to 1, this implies that there is
only one transitory shock and there are four permanent shocks. The transitory
shock is identified without further restrictions, (r(r − 1)/2 = 0). However, the
permanent shocks are identified with requiring at least 10 ((n − r)(n − r − 1)/2)
additional restrictions, which are imposed on contemporaneous matrix B. Follow-
ing Kim and Roubini [19], we impose twelve restrictions in total. The likelihood
ratio test indicates that the over-identifying restrictions are not rejected. The
relation between the pure shocks and reduced-form shock can be expressed in
the contemporaneous form (SR) in Eq. (9):
⎡ ⎤
∗0∗000
⎢∗ ∗ ∗ ∗ 0 0⎥
⎢ ⎥
⎢∗ ∗ ∗ 0 0 0⎥
−1
SR = εt = H ut = ⎢ ⎢ ⎥ (9)

⎢∗ ∗ ∗ ∗ 0 0⎥
⎣∗ ∗ ∗ ∗ ∗ 0⎦
∗∗∗∗∗∗

The first row expresses the monetary policy reaction function. This study
assumes that the Bank of Thailand (BOT) sets the policy interest rate (PR)
3
Considering only one cointegrating relation. The reason behind this consideration
is that we emphasize a monetary policy shock as only one transitory shock. The
identification of the long-run restrictions is also based on the assumption of money
neutrality that a monetary policy does not permanently affect real variables in the
long-run.
4
The zeros are the restricted element and the asterisks are unrestricted elements.
Monetary Policy Shocks and Macroeconomic Variables 211

after observing the price level (P) and two exogenous variables. However, the
BOT is assumed to respond real output (Y) and monetary aggregate (M) with
delay. The second row represents for the money demand. It is assumed to con-
temporaneously react to the policy interest rate (PR), prices (P) and real output
(Y). The third row represents the price equation. The price is contemporaneously
influenced by the policy interest rate (PR), monetary aggregate (M). The fourth
row stands for the output equation. It is assumed to contemporaneously respond
to the policy interest rate (PR), monetary aggregate (M) and prices (P). The
fifth row represents the exchange rate equation. It is contemporaneously affected
by all above variables except the trade balance (TB). The sixth row represents
the trade balance, which is contemporaneously affected by all variables in the
system.

3.3 The Data Sources and Definitions

The six-variable SVEC model is estimated using quarterly data for Thailand5 .
The period of model estimation ranges from 2000q2 to 2017q2, covering the
implementation period of inflation targeting framework.
The variables used in the specified model are as follows. The policy interest
rate (PR) is determined by the Bank of Thailand (BOT). The monetary aggre-
gate (M) is measured by the log of narrow monetary aggregate. The prices (P)
is measured by the log of consumer price index (2010 = 100). The real output
(Y) is measured by the log of real gross domestic product. The exchange rate
(ER) is measured by the log of real effective exchange rate (2010 = 100). The
foreign interest rate is the U.S. federal fund rates. The trade balance (TB) is
measured in terms of logarithm of the ratio of exports to imports6 . The main
sources of data are the International Financial Statistics of the IMF and the
Bank of Thailand. In addition, the world oil prices (WOP), are compiled from
the Federal Reserve Bank of St. Louis.

3.4 Unit Root Test and Cointegration Analysis

The first step in our analysis is to examine the time-series properties of the
variables. In general, the augmented Dickey-Fuller (ADF) and the Kwiatkowski,
Phillips, Schmidt and Shin (KPSS) tests7 are commonly performed. The results
suggest that most variables under consideration have a unit root in a level form
but are stationary in the first-order log-difference form. These results imply that
all the variables are I (1).

5
In estimation procedure, this study uses the R program by Pfaff [25].
6
Since the trade balance, which is the difference between exports and imports, might
be negative values; it could not take the logarithm transformation.
7
The ADF test is based on the null-hypothesis that the series under testing has a unit
root. The KPSS test is based on the hypothesis that the series under consideration
is stationary and hence does not have a unit root.
212 P. Arwatchanakarn

The second step is to determine the number of the cointegrating vectors. This
study uses the Johansen cointegration approach to examine the cointegral rela-
tionships among the variables. The Akaike information criterion is used to select
lag lengths. Four lags were sufficient to capture the dynamic of all the variables.
The trace and the maximum eigenvalue tests indicate four cointegrating rela-
tions among six variables. However, this study considers only one cointegration
relationship (r = 1).

4 Empirical Results
4.1 Impulse Response Functions
Having estimated the identified model, the impulse responses for all six variables
are generated and reported in Figs. 2 and 3. The focus is on the impulse responses
of key macroeconomic variables to two monetary policy shocks. As we expected,
the impulse response functions show the interrelations among all variables.
Figure 2 presents the impulse responses to two monetary policy shocks,
namely policy interest rate and monetary aggregate, together with the upper
and lower confidence intervals. Empirical results are shown as follows.
The first column in Fig. 2 reveals the responses of a contractionary policy
shock by raising the policy interest rate on the variables specified in our model.
In response to the shock, the monetary aggregate (M), prices (P) and real output
(Y) fall initially and, after that, they rise to their initial baselines. These results
are consistent with theoretical expectations and the specified SVEC model does
not generate evidence of the liquidity, output and price puzzles. In the case of
exchange rate (ER), the effect of a contractionary policy is a depreciation of the
domestic currency. This is evidence that the exchange rate puzzle still exists in
our model. In response to the shock, the trade balance (TB) initially improves8 .
After the initial impact rise, it starts to fall and follows by a mean reversion to
its pre-shock level. These results imply that the short-term policy interest rate
remains a valid and effective monetary policy instrument for achieving price
stability and improving the international trade.
The second column in Fig. 2 reveals the responses to an expansionary policy
shock by raising the money on the variables specified in our model. In response
to this expansionary policy shock, the interest rate (PR) initially falls and, after
that, increases to above its pre-shock level. The effect of this expansionary policy
on the price level (P) is negative and this is not consistent with the theoretical
expectations9 . In response to this expansionary policy shock, the real output
8
The improvement of the trade balance is driven by the relatively strong import
contraction. The reason behind this is that the contractionary monetary policy shock
shrinks the output and, in turn, reduces import demand. This effect is called ’the
income absorption effect.
9
The impulse response functions that contradict theory predictions are known as
empirical puzzles that are often found in monetary literature. These anomalies may
come from modelling issues e.g. identification and data limitations with a short time
span.
Monetary Policy Shocks and Macroeconomic Variables 213

(Y) increases initially and over time whereas the real effective exchange rate
(ER) initially depreciates and follows by a mean reversion to its pre-shock level.
In addition, this expansionary policy shock causes the trade balance (TB) to
worsen initially and over time10 . Even though the money is not an appropriate
monetary instrument to maintain the price stability, it could be a supplementary
instrument for stimulating economic growth and improving international trade
under the inflation targeting.
In addition, the responses of two monetary instruments (the policy inter-
est rate and monetary aggregate) to target variables (prices, output and trade
balance) are presented in Fig. 3. The first column represents the responses of
the policy interest rate to shocks on target variables. First, in response to a
price shock, the policy interest rate promptly increases and remains above its
pre-shock level over time. This reveals supporting evidence that the bank of
Thailand (BOT) has the primary objective to maintain price stability and acts
as an inflation fighter under inflation targeting framework. Second, in response
to an output shock, the policy interest rate falls initially and over time. This
result does not provide a clear indication that the Bank of Thailand pursues an
objective of sustained output. Third, in response to a trade balance shock, the
policy rate initially increases and falls below its pre-shock level. However, the
adjustment of the policy interest rate is conducted as a key monetary instrument
in managing inflation and sustaining output.
The second column in Fig. 3 represents the responses of the monetary aggre-
gate to shocks on target variables. First, in response to a price shock, the mon-
etary aggregate increases initially and over time. This response of money is
consistent with the quantity theory of money (QTM) that a rise in prices drives
money increasing, when and if the velocity of circulation and output are con-
stant. Second, in response to an output shock, initially the money increases and
is follows by a mean reversion to its pre-shock level. This result is also consistent
with the QTM and money market equilibrium. Third, in response to a trade bal-
ance shock, initially the money falls and remains below its pre-shock level over
the time. This is not in line with theoretical expectations. However, one policy
implication emerges that monetary aggregate could be an optional instrument
for controlling inflation and stimulating output at least in the short run.

4.2 Forecast Error Variance Decompositions

In addition to the impulse responses of, we examine the forecast error variance
decompositions of the macro economic variables. The variance decompositions
for the SVECM are reported in Fig. 4. The interesting results on the variances
decomposition are drawn as follows.
First, as for the policy interest rate, the most dominant source of its fluctu-
ation is the variance of price level.
10
The reason behind this is that an expansionary monetary policy raises domestic
income and, in turn, increases import demand, which leads to a deterioration of the
trade balance.
214 P. Arwatchanakarn

Fig. 2. Impulse responses to two monetary policy shocks in an SVEC model


Monetary Policy Shocks and Macroeconomic Variables 215

Fig. 3. Impulse responses of the interest rate and monetary aggregate to the prices,
real output and trade balance shock in an SVEC model

Second, the variance decompositions of monetary aggregate indicate that


the price level is the dominant source of the fluctuation in money. In addition,
the policy interest rate is the second key determinant of monetary aggregate
fluctuation even in the short-run.
Third, in the case of prices, its fluctuation is mainly originated by the variance
of monetary aggregate in a one-year horizon. The policy interest rate contributes
about 14% to the fluctuation of prices in the first quarter. This implies that
changes in monetary aggregate (or the policy interest rate) might cause prices
to be unstable at least in short-run.
Fourth, most output fluctuation is largely explained by output’s own shock
on itself accounting for more than 80%. The price shocks play a significant role
in the fluctuation in output; it contributes around 7 to 13% of that fluctuation.
216 P. Arwatchanakarn

Fig. 4. Forecast error variance decomposition for the 6-variables SVEC model
Monetary Policy Shocks and Macroeconomic Variables 217

Fifth, in the case of exchange rate, most of its fluctuation is substantially


explained by exchange rate’s own shock on itself although it has a decreasing role
over time. The real output shock is the second largest source of the fluctuation
in exchange rate in the medium-run and long-run.
Lastly, the variance depositions of the trade balance suggest that, after the
trade balance shocks itself, the exchange rate shocks have significant impact on
fluctuation in the trade balance in the short-run. However, in the long-run, the
price shocks are the second key factor of the fluctuation in the trade balance.
The overall results show that the variance of price level appear to be an
important factor in variances of all variables in the specified model. This provides
a support that price stability is also an important requirement for improving
trade balance and sustaining the output in the long-run.
Overall, the empirical results, which are generated by the identified SVEC
model, show that there are feedback relations among all six variables in the sys-
tem, especially among real output, prices and monetary aggregate. The external
shocks transmit to the domestic economy contemporaneously and dynamically.
The main finding is that, under inflation targeting, adjusting the policy interest
rate is conducted as a key instrument of monetary policy in keeping inflation
and output stable. In other words, the policy interest rate remains an effective
monetary instrument for achieving price stability and improving international
trade. Even though the monetary aggregate is not an appropriate monetary
instrument to maintain price stability, it could be a supplementary instrument
for stimulating economic growth and improving international trade in the short
run.

5 Conclusions and Policy Implications

This paper has investigated the relationship among monetary policy shocks and
some key macroeconomic variables in Thailand using quarterly data over the
inflation targeting period of 2000q2–2017q2. The dynamic interrelations among
these variables are analysed by estimating a six-variable SVEC model, which
consists of real output, prices, interest rates, monetary aggregates, exchange
rates and trade balance, in the presence of two external shocks, namely the world
price of oil and the US federal fund rate. The overall results suggest significant
feedback relations among the six endogenous variables in the specified model.
The overall responses of the macroeconomic variables obtained from the SVEC
model are consistent with most common theory expectations.
An important finding from this study is that a contractionary policy has an
important effect on prices in Thailand. It reveals that monetary policy affects
the prices, real output and trade balance at least in the short run. In addi-
tion, the empirical results provide confirmatory evidence that price stability is
essential for sustaining economic growth and improving international trade. More
importantly, under the inflation targeting, the policy interest rates remains valid
and effective as the monetary instrument for achieving price stability. Monetary
aggregate also remains significant in the conduct of monetary policy in Thailand.
218 P. Arwatchanakarn

Also, exchange rate remains a valid channel in which transmits monetary policy
to international trade via exports, imports and trade balance.
Based on results obtained in the paper, some policy implications on the state
of monetary policy in Thailand can be drawn. First, achieving the price stability
is essential to maintain steady economic growth. One policy implication is that
the price stability requires a credible and transparent monetary policy with
an effective monetary transmission mechanism. Second, as monetary aggregate
is important in the monetary transmission mechanism, the Bank of Thailand
should not ignore monetary aggregates in its implementation of monetary policy
for price stability. The Bank of Thailand might opt for a monetary aggregate
as a supplementary instrument of monetary policy under its inflation targeting
framework. Third, when and if necessary, exchange rate measures, such as capital
controls and foreign exchange rate intervention, could be used to stabilise the
exchange rate, to improve international trade and to ensure the soundness of
economic stability.

References
1. Arwatchanakarn, P.: Structural vector autoregressive analysis of monetary policy
in Thailand. Sociol. Study 7(3), 133–145 (2017)
2. Arwatchanakarn, P.: Exchange rate policy, monetary policy and economic growth
in Thailand: a macroeconomic study, 1950–2016. The University of Newcastle,
Newcastle, NSW, Australia (2018)
3. Charoenseang, J., Manakit, P.: Thai monetary policy transmission in an inflation
targeting era. J. Asian Econ. 18(1), 144–157 (2007)
4. Chucherd, T.: Monetary and fiscal policy interactions in Thailand. Bank of
Thailand, Bangkok (2013)
5. Disyatat, P., Vongsinsirikul, P.: Monetary policy and the transmission mechanism
in Thailand. J. Asian Econ. 14(3), 389–418 (2003)
6. Enders, W.: Applied Econometric Time Series. Wiley, Hoboken (2010)
7. Faust, J., Leeper, E.M.: When do long-run identifying restrictions give reliable
results? J. Bus. Econ. Stat. (1997). https://doi.org/10.2307/1392338.
8. Fisher, L.A., Huh, H.S.: Identification methods in vector-error correction models:
equivalence results. J. Econ. Surv. 28(1), 1–16 (2014)
9. Fung, B.S., Kasumovich, M.: Monetary shocks in the G-6 countries: is there a
puzzle? J. Monet. Econ. 42(3), 575–592 (1998)
10. Grenville, S., Ito, T.: An independent evaluation of the Bank of Thailand’s mone-
tary policy under the inflation targeting framework, 2000–2010. Bank of Thailand,
Bangkok (2010)
11. Grilli, V., Roubini, N.: Liquidity and exchange rates: puzzling evidence from the
G-7 countries. New York University, New York (1995)
12. Hesse, H.: Monetary policy, structural break and the monetary transmission mech-
anism in Thailand. J. Asian Econ. 18(4), 649–669 (2007)
13. Hossain, A.: The Evolution of Central Banking and Monetary Policy in the Asia-
Pacific. Edward Elgar Publishing, Cheltenham (2015)
14. Hossain, A., Arwatchanakarn, P.: Inflation and inflation volatility in Thailand.
Appl. Econ. (2016). https://doi.org/10.1080/00036846.2015.1130215
Monetary Policy Shocks and Macroeconomic Variables 219

15. Hossain, A.A., Arwatchanakarn, P.: Does money have a role in monetary policy
for price stability under inflation targeting in Thailand? J. Asian Econ. (2017).
https://doi.org/10.1016/j.asieco.2017.10.003
16. Ivrendi, M., Guloglu, B.: Monetary shocks, exchange rates and trade balances:
evidence from inflation targeting countries. Econ. Modell. 27(5), 1144–1155 (2010)
17. Jang, K., Ogaki, M.: The effects of monetary policy shocks on exchange rates: a
structural vector error correction model approach. J. Jpn. Int. Econ. 18(1), 99–114
(2004)
18. Kim, S.: Effects of monetary policy shocks on the trade balance in small open
European countries. Econ. Lett. 71(2), 197–203 (2001)
19. Kim, S., Roubini, N.: Exchange rate anomalies in the industrial countries: a solu-
tion with a structural VAR approach. J. Monet. Econ. 45(3), 561–586 (2000)
20. Koray, F., McMillin, W.D.: Monetary shocks, the exchange rate, and the trade
balance. J. Int. Money Financ. 18(6), 925–940 (1999)
21. King, R., Plosser, C., Stock, J., Watson, M.: Stochastic trends and economic fluc-
tuations. Am. Econ. Rev. 81(4), 819–840 (1991)
22. Kubo, A.: Macroeconomic impact of monetary policy shocks: evidence from recent
experience in Thailand. J. Asian Econ. 19(1), 83–91 (2008)
23. Lütkepohl, H.: New Introduction to Multiple Time Series Analysis. Springer,
Heidelberg (2005)
24. Pagan, A.R., Pesaran, M.H.: Econometric analysis of structural systems with per-
manent and transitory shocks. J. Econ. Dyn. Control. 32(10), 3376–3395 (2008)
25. Pfaff, B.: VAR, SVAR and SVEC models: implementation within R package vars.
J. Stat. Softw. 27(4), 1–32 (2008)
26. Phiromswad, P.: Measuring monetary policy with empirically grounded restric-
tions: an application to Thailand. J. Asian Econ. 38, 104–113 (2015)
Thailand’s Household Income Inequality
Revisited: Evidence from Decomposition
Approaches

Natthaphat Kingnetr1,2 , Supanika Leurcharusmee2(B) ,


and Songsak Sriboonchitta2,3
1
Bank of Thailand, Northern Region Office, Chiang Mai, Thailand
natthaphat.kingnetr@outlook.com
2
Faculty of Economics, Chiang Mai University, Chiang Mai, Thailand
supanika.econ.cmu@gmail.com, songsakecon@gmail.com
3
Puey Ungphakorn Center of Excellence in Econometrics, Chiang Mai University,
Chiang Mai, Thailand

Abstract. This study decomposes income inequality across household


in Thailand in three dimensions: sources of income, industrial subgroups,
and household characteristics. The results show that the source of income
with the highest contribution to the inequality is income from businesses.
In term of industry, we found that real estate, wholesale and retail trade,
manufacturing and agriculture experience highest income inequality. For
the analysis on household characteristics, we examined drivers for over-
all income inequality and also examined separately for each source of
income and industry subgroup. The results raise attention to the impor-
tance of households’ wealth on inequality as the inequality in financial
asset and credit accessibility contribute highest to income inequality and
more than that of education. In addition, we found that the key contrib-
utors of income inequality are heterogeneous across industrial subgroups.
In particular, different types of financial assets and funds contribute dif-
ferently to income inequality in each industrial subgroups. Therefore, in
addition to ensuring an equal opportunity in education, a more equal
access to different types of funding is also crucial for income inequality
reduction.

Keywords: Income inequality · Decomposition


Generalized Entropy · Heterogeneity · Thailand

1 Introduction
Although the global income inequality does not change much and differences
between countries have decreased, the within-country inequality has increased
significantly from 1988 to 2008 [14]. With economic growth, incomes at the
bottom grew much slower than those at the top and did not grow at all in some
countries. This causes a rise in the income share of the top percentiles in nearly
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 220–234, 2019.
https://doi.org/10.1007/978-3-030-04263-9_17
Thailand’s Household Income Inequality Revisited 221

all countries. Several studies in both developed and developing countries have
support the outcomes [1,4,20].
Is income inequality a problem? Literature has debated over the impacts of
inequality on economic growth as theoretically inequality can yield both positive
and negative impacts on growth. Federico Cingano summarized in [4] that the
positive impacts occur because inequality gives incentives for people to work
harder and save more. Therefore, it increases productivities and capital accumu-
lation. For the negative impacts, inequality causes under-investment in education
among poor population due to financial market imperfection. In the aggregate
level, inequality reduces demand and may reduce technological adoption that
requires economies of scale. Finally, strong inequality can lead to political insta-
bility and social unrest. In addition to economic impacts, inequality also has
negative impacts on people’s physical and psychological health and well-being
[3,9]. As the inequality has both pros and cons, some level of inequality is desired.
However, as the economic growth has not yet improved income inequality in gen-
eral and the level of inequality within counties is increasing in many countries,
the disadvantages of the inequality can have more serious effects.
Whether the high level of inequality should be a concern also depends on the
belief in the Kuznets’ hypothesis, which states that the level of inequality will
decrease once the economy is developed to a certain point. The study [15] by Iven
Lyubimov shows literature reviews on Kuznets’ hypothesis and discussions on
multiple studies that provide contradictory results from the Kuznets’ hypothesis.
The key opposing study [19] is done by Thomas Piketty, who uses 100 years of
data from 20 countries to examine the relationship between economic growth and
income inequality. His results suggest that an automatic decrease in inequality
cannot be expected as the economy developed, which contradicts the Kuznets’
hypothesis.
For the drivers of inequality, according to [19], they come from two funda-
mental sources—labour earning potential and inherited wealth. Therefore, the
gap between the return to capital and economic growth accelerates inequality. In
addition, Dabla-Norris et al. [7] found that an increase in skill premium is the
key factor that increases the income inequality in developed countries, and finan-
cial deepening is the key factor that increases the income inequality in develop-
ing countries. Evidence has shown that the key sources of income inequality are
either the unequal opportunity in education and wealth or the policies, such as a
progressive tax on labour income and inherited wealth, that are not well-designed.
Glomm and Ravikumar [11] found that an equal opportunity in education through
public investment in human capital leads to a decline in income inequality. Institu-
tions and policies also play an important role in controlling the level of income and
wealth inequality. In addition, several studies [1,8] found that democracy increases
redistribution when inequality rises and, thus, reduces inequality.
For the case of Thailand, the World Bank reported the Gini coefficient
of 37.80 for Thailand in 2013. Credit Suisse’s Global Wealth Databook 2017
reported that the wealth share of the top 1% in Thailand was 56.2%, which was
greater than the world average of 50.1%. The Gini coefficients calculated using
222 N. Kingnetr et al.

the national household data from the socio-economic survey have dropped and
presented the Kuznets’ pattern [13,26]. However, both the level of inequality
and the trend depends heavily on the choices of measurement and empirical
modelling. While some studies found that the inequality problem in Thailand
is declining, others found contradictory evidences [13,16–18,26]. For drivers of
income inequality, it is agreed upon by several studies that unequal opportu-
nity and choices of education is a key driver of income inequality [13,16,18]. In
addition, Kilenthong [13] found that occupation, financial access and urban and
rural locations contribute to the explanation of income inequality. Paweenawat
[18] found that family factors, such as number of children and earners, also play
an important role. Pawasutipaisit [17] uses monthly panel data to examine or
upward mobility in net worth and found that the return on assets is the key
factor. Meneejuk [16] examines macroeconomic factors and found that share of
private sector and inflation also have an impact on inequality.
In this study, we use the data from the 2015 Socio-economic Survey (SES)
to calculate the Generalized Entropy (GE) measurement for household’s income
inequality. We use household-level data instead of individual-level data because
of the possibility of task specialization within family. It is not unusual in the
Thai context that some family members may specialize in housework or child
care, while some members specialize in labour market work. Using individual-
level data to calculate income inequality can be misleading. The main reasons
for adopting the GE method for inequality measurement are the following: (1)
it is a widely-used for inequality measurement in the literature, (2) it is more
robust comparing to the Gini index when the data include zero and negative
incomes, and (3) it satisfies all necessary properties of a well-designed inequality
index as discussed in [12].
The income inequality then can be decomposed to examine the sources of
the inequality. This study performs three inequality decomposition approaches;
the source decomposition approach [22], the subgroup decomposition approach
[23], and the regression-based decomposition approach [6,10]. In the first app-
roach, we first examine which sources of household’s income contribute most to
income inequality. As the results show that profits from businesses, wages from
employment, and profits from farming are the key sources, the second approach
examines the role of industrial subgroups to the income inequality within each of
the three income sources. The results show that real estate, wholesale and retail
trade, manufacturing and agriculture experience the highest income inequality.
The third approach identifies drivers of income inequality. As it is possible
that inequality drivers can differ across both income sources and industries. In
this part, we decompose the income inequality for each of the main sources and
industries. Following the framework introduced by Thomas Piketty in [19] and
previous empirical studies [2,6,13,18], we try to emphasize the effects of human
and physical capitals on the inequality. Human capital is measured by years of
education, age and gender. Physical capital is measured by financial assets, size
of owned land and access to the internet. In addition, we also include the credit
constraint variables, which are debt to house and land, business and agricultural
Thailand’s Household Income Inequality Revisited 223

activities. Consistent with other literatures, unequal levels of education is highly


associated with income inequality. However, the results show that financial asset
contributes most to the inequality. The significance of financial assets and busi-
ness debts is consistent with [19], which emphasizes the association of capital
to income inequality. It should be noted that unobserved factors, such as ability
of household members, that affect both income and financial asset, can cause
an upward bias in the estimate of the impact of financial asset. However, our
results show a large gap between the effects of financial assets and other inequal-
ity drivers and, thus, financial assets should be considered as an important driver
regardless.
Although the key drivers of inequality are the same across income sources
and industries, the size of the contribution of each factor differs. In particular,
different types of financial assets and funds contribute differently to income
inequality in each industrial subgroups.

2 Methodology and Data


In this section, we start with a discussion on the inequality measurement, fol-
lowing by three well-known income inequality decomposition approaches; the
source decomposition approach [22], the subgroup decomposition [23], and the
regression-based decomposition approach [6,10]. Finally, analysis process and
data in the study are explained.

2.1 Generalized Entropy Inequality Measurement


Haughton pointed out in [12] that a good measure of income inequality should
satisfy the following requirements: mean independence, population size indepen-
dence, symmetry, Pigou-Dalton transfer sensitivity, decomposability, and statis-
tical testability. Cowell introduced the so-called generalized entropy index (GE)
for inequality measurement in [5] based on the information theory introduced by
Shannon in [21] and is an improvement from the work [25] by Theil. It satisfies
all the mentioned criteria. The formula for the GE index can be specified as
follows:
 n  
1  1  y i α
GE (α) = −1 , (1)
α (1 − α) i=1
n y
where GE (α) represent the value of generalized entropy inequality index for a
given value of parameter α, yi denotes the ith income observation, and α is a
whole number depending on how much sensitive the index to the way incomes
are spread in different parts of income distribution. If the value of α is higher, the
index will become more sensitive to the deviation in incomes at the upper tail.
In this study, we employ α = 0, 1, and 2 as they are well-known in the literature
and commonly used in leading organizations such as IMF and the World Bank.
In addition, GE(0) is also called Theil’s L index and GE(1) is called Theil’s T
index [12].
224 N. Kingnetr et al.

One prominent downside of the GE is that it can take any value from zero
to infinity [12]. Zero indicates perfect income equality and a greater value means
larger income inequality. Nevertheless, unlike the conventional Gini index which
is prone to data with negative incomes, the GE is more robust comparing to
the Gini index when researchers want to include zero and negative incomes in
their analyses. According to [12], the equation for Gini index can be specified as
follow:
N
1 
Gini = 1 − (yi + yi−1 ) , (2)
N i=1
where yi is the ith income observation. N is a number of total observation. It
can be noticed that, if the data contain observation of negative income, there
is a chance that value of the Gini index to be greater than one, and thus it
will violate the requirement for the Gini index be valued between 0 to 1. In
addition, the Gini lacks decomposability, meaning that the sum of inequality of
subgroups is not equal to the total inequality [12]. This property is crucial in our
analysis as we are interested in decomposing the inequality by subgroups. Due
to these reasons, it is appropriate to employ the GE in this study as we seek to
investigates the inequality through different decomposition approaches.

2.2 Decomposition by Sources

The approach was invented by Shorrocks [22] to investigate contributions of


sources of income to a total income inequality. This method requires that
F

y= zf , (3)
f =1

where y is total income, zf denote the amount of income from source f , and F
is a total number of income sources. Then, one can investigate the share of these
income components to income inequality through the natural decomposition rule,
in a general form, that is
F

I (y) = θf , (4)
f =1

where I (y) is a value of total income inequality measured by any inequality


measures; θf is the absolute share of income inequality from an income source
f and have the same unit of measure as I (y). This rule shows that the sum of
inequality from each income source should equal to the income inequality index
calculated using total income. However, we are more interested in term of the
θf
percentage, sf . Thus, from Eq. 4, we can see that sf = I(y) . Alternatively, based
on [22], we can get sf from

cov [zf , y]
sf = . (5)
σ 2 (y)
Thailand’s Household Income Inequality Revisited 225

Therefore, the share of total income inequality from particular source, f ,


can be measured from a covariance between the source f and total income, y,
and divided by the variance of y. In addition, it can be noticed that this step
does not involve any income inequality measure. The advantage of the Shorrocks
approach [22] is that the percentage share of sources to total income inequality
will remain the same for any inequality measures [2]. However, the absolute share
may vary depending on the measurements of income inequality [6].

2.3 Decomposition by Subgroup

According to Shorrocks in [23], the GE index can be decomposed by population


subgroup into two types of inequality as follows:

GE (α) = GEw (α) + GEb (α) , (6)


K

and GEw (α) = Vk1−α Skα GEk (α), (7)
k=1

where GEw (α) is within-group inequality, GEb (α) is between-group inequality,


Vk is a proportion of people in subgroup k to the total population, Sk is total
income share belong to k subgroup, and GEk (α) is the GE index for subgroup
k. Note that GEb (α) assumes all members in subgroup k receive the mean
income of such group, y k . Moreover, the greater level of within-group inequality
suggests the higher income variation within the subgroup, while the greater
level of between-group inequality indicates a strong income differences between
subgroups.
The decomposition by sources and subgroups are useful for policy makers
when it comes to policy prioritisation [24]. For instance, if the cause of total
income inequality is from the disparities in income among enterprises within an
industry sector, then the government should focus on improving competitiveness
of the disadvantaged enterprises in the sector. Another example is that, if the
variation in income are mainly from salary rather than return to investment, then
the government may consider prioritising policies to improve human capital and
labour market efficiency in order to alleviate income inequality.
However, the sources and subgroup decomposition approaches have been crit-
icised for being too restrictive [6]. The source approach requires that the total
income and its sources must follow a natural decomposition rule, which means
that the total income must equal to the sum of its sources, despite its flexi-
bility to be used with any types of income inequality measure. The subgroup
approach requires the discrete variable for its partition criteria. However, other
important socio-economic factors that have potential effect on income distribu-
tion are continuous (e.g. age, number of earners, debt-income-ratio, and wealth).
To overcome this issue, we will turn to the regression-based decomposition
approach [6,10].
226 N. Kingnetr et al.

2.4 Regression-Based Decomposition

The approach was first introduced by Fields [10], and then being further
improved by Cowell [6]. The whole idea of this approach is based on the data
generating process that takes the linear form, also known as a linear regression
model, that is
K

yi = β0 + βk xk,i + i , (8)
k=1

where y represents household income, xk is kth observable household charac-


teristic, i is an error term. As long as the conventional assumptions, such as
exogeneity in explanatory variables and the error term follows zero mean and
constant variance, hold, one can use OLS approach to obtain the parameters
and residuals [6]. Then Eq. 8 can be rearranged into
K

0 +
yi = β zk,i + i , (9)
k=1

where zk,i = β k x is composite variable. Brewer and Wren-Lewis [2] suggest


k,i
the use of the source decomposition technique by Shorrocks [22] if the composite
variables and residuals are treated as income sources. Thus, the contribution of
household characteristics in this sense to total inequality can be calculated as

cov (zk , y)
sk (y) = , (10)
σ 2 (y)
which is similar to Eq. 5 except that the composite variables are used instead
of income sources. Also note that this approach allows us to investigate the
contribution of residual to total income inequality. It is highly importance that
this term should not be ignored, as an existence of the residual in calculation of
inequality share can potentially affect the results [6].

2.5 Analysis Process and Data

We perform our analysis using the methods discussed above with the SES data in
2015 from the National Statistical Office (NSO) of Thailand. We consider first
the decomposition by household income sources to see which types of house-
hold income experience the largest income inequality. Then, we examine the role
of industrial subgroups to the income inequality. Lastly, the regression-based
decomposition is employed to investigate the within-group inequality. This will
allow us to assess how household characteristics contribute to the income inequal-
ity in each important occupation subgroup. Four important household charac-
teristics are considered in this study: (1) level of education of household’s highest
earner, (2) level of household financial asset, (3) level of credit accessibility, and
(4) level of internet accessibility.
Thailand’s Household Income Inequality Revisited 227

3 Empirical Results
It can be seen from Table 1 that the business-type income contributes to the
household income inequality the most (60.7%), following by the wage and salaries
(19.8%), and the farming (9.8%). This is expected as business tends to generate
higher income and involves greater risks than the other types of works as seen
from the highest value of GE. However, certain types of business or industry
may face higher or lower income disparity than others. We will then examine
industry subgroups to see in more details on how each sector experiences income
inequality.

Table 1. Absolute income inequality and share to total inequality by income source

Income source GE(2) Share of income


inequality (%)
Wages and salaries 0.214 19.8
Net profit from business 0.655 60.7
Net profit from farming 0.106 9.8
Pensions, annuities, and other assistances 0.019 1.8
Work compensations or terminated payment 0.001 0.1
Money assistance from other people outside household 0.004 0.3
Elderly and disability assistance from government 0.000 0.0
Income from rent of house, land, and other properties 0.009 0.9
Saving interest, bonds and stocks 0.010 0.9
Interest of individual lending 0.000 0.0
In-kind 0.026 2.4
Other 0.035 3.3
Total 1.079 100.0

3.1 Income Inequality by Industry Subgroup

According to Table 2, the top 5 of industrial subgroup facing income inequal-


ity are real estate, trade, manufacturing, agriculture, and construction. These
industries constitute approximately 65% of sampled households in the survey.
Moreover, the between-group inequality is much higher than the within-group
inequality for all three types of the GE index. This suggests that an income
inequality due to the different types of occupation may not as severe as the way
income distributed in a particular work itself.
Figure 1 further illustrates such inequality. It can be seen that the agricul-
ture sector constitutes the largest population and also exhibits the lowest relative
household monthly income, which is approximately a half of the average house-
hold income as a whole. Even worse is the income disparity in this subgroup is
228 N. Kingnetr et al.

Table 2. Income inequality by industrial subgroup

Rank Industry GE(0) GE(1) GE(2) Obs


1 Real estate activities 0.473 0.699 2.105 100
2 Wholesale and retail trade; repair of 0.317 0.446 1.746 6,158
motor vehicles and motorcycles
3 Manufacturing 0.243 0.339 1.570 5,082
4 Agriculture, forestry and fishing 0.269 0.363 1.258 8,787
5 Construction 0.253 0.333 0.864 2,634
6 Activities of households as employers 0.324 0.417 0.828 264
7 Transportation and storage 0.277 0.354 0.765 945
8 Activities of extraterritorial 0.524 0.522 0.630 4
organizations and bodies
9 Administrative and support service 0.249 0.295 0.507 389
activities
10 Water supply; sewerage, waste 0.247 0.289 0.463 103
management and remediation activities
11 Professional, scientific and technical 0.279 0.285 0.377 233
activities
12 Other service activities 0.222 0.240 0.368 829
13 Financial and insurance activities 0.261 0.263 0.335 419
14 Information and communication 0.259 0.260 0.334 167
15 Human health and social work activities 0.232 0.237 0.327 900
16 Electricity, gas, steam and air 0.275 0.257 0.307 137
conditioning supply
17 Education 0.217 0.217 0.304 1,750
18 Accommodation and food service 0.205 0.216 0.298 2,542
activities
19 Arts, entertainment and recreation 0.216 0.217 0.282 251
20 Public administration and defence; 0.183 0.196 0.278 2,717
compulsory social security
21Mining and quarrying 0.154 0.159 0.191 85
Within-group inequality 0.256 0.321 1.018
Between-group inequality 0.056 0.058 0.061
Total inequality 0.314 0.379 1.079
Note: The ranking is based on the values of GE(2) in an descending order.

also high, comparing to the others. On the contrary, households working in the
financial and insurance activities as well as information and communication are
able to earn nearly twice of average household income, yet experiencing more
equally income distribution. This finding implies that certain household charac-
teristics may play an important role in income disparity both between and within
Thailand’s Household Income Inequality Revisited 229

the groups. We will now turn to regression-based decomposition approach to see


what the cause of such differences.

Fig. 1. The GE(2) index and relative household monthly incomes working in each Thai
industry

3.2 Income Inequality Decomposed by Household Characteristics


We first start with an analysis of potential underlying factors of income inequality
as a whole. The results of regression-based decomposition are shown in Table 3.
It can be seen that total inequality are mainly stemmed from financial assets
(11%), accessing to credit for business (6%), education (4%), exposure to the
internet (3%), and accessing to credit for house and land (1%). This findings are
consistent with what we have discussed in the previous section, where agriculture
facing the lowest household income while those in financial activities can earn
much more. In general, this implies that the industry, involving skilled-labour
and highly utilising internet technology, financial assets, and credits, tends to be
able to generate higher income than those that does not.
However, the income inequality within each industry may be caused by the
factors differently. Thus, we will turn to our investigation to three important
industrial subgroups: agriculture, manufacturing, and wholesale and retail trade
since they exhibit relatively high income inequality and involve more than half
of the total population in this study.
Table 4 shows the results of household inequality decomposition of three
important subgroups. Starting with the agricultural sector, it can be seen that
230 N. Kingnetr et al.

Table 3. Decomposition of total income inequality by regression-based approach

Source of inequality GE(0) GE(1) GE(2) Share of


inequality
(%)
Female 0.000 0.000 0.000 0.0
Age 0.001 0.001 0.003 0.2
Years of schooling 0.013 0.016 0.046 4.3
Proportion of earner 0.000 0.001 0.002 0.2
Proportion of member accessing internet 0.009 0.011 0.031 2.9
Size of owned land 0.001 0.001 0.002 0.2
Size of rented land 0.000 0.000 0.001 0.1
Financial assets 0.035 0.042 0.119 10.9
Debt for house and land 0.004 0.005 0.015 1.4
Debt for business activity 0.019 0.024 0.067 6.2
Debt for agricultural activity 0.001 0.001 0.002 0.2
Residual 0.231 0.279 0.793 73.4
Total 0.314 0.379 1.079 100.00
Note: The top five factors contributing to the inequality are in bold.

the important elements of income inequality in this group are slightly different
from the big picture. The financial asset remains the largest source of income
inequality (13%), following by size of land in possession (3%), and ability to
access loan for agricultural activity (4%). The effect of education, however, is
rather low in this group, constituting only 0.3% of the within-group inequality.
This implies that level of eduction has not received much attention for working in
the agricultural sector. The reason would be that many farming families in Thai-
land still employ the traditional ways of farming, which has been passed down
from generation to the next. Without much financial asset and land at disposal,
households with rental land have to bear the costs, which significantly reduce
their net income. Combining with a difficulty in accessing credit for agricul-
tural activity, this prevents them the opportunity to invest in better agricultural
equipments and machineries, creating the income gap within this group even
further.
In the case of manufacturing sector, financial assets become more important
source of the income inequality (32%). On the contrary, the inequality as the
result of education or even other factors considered in this study seems to be
insignificant. This implies that the differences in household characteristics in this
group are not large except for the financial assets. To be able to run manufac-
turing business, certain levels of these characteristics must be met. All is left is
the wealth of household to initiate the business. Obviously, the household with
high wealth would be able to invest in their manufacturing activity than those
do not, and thus generating more income.
Thailand’s Household Income Inequality Revisited 231

Table 4. Factor source decomposition of within-inequality of important subgroup

Element of inequality Share of within-group inequality (%)


Agriculture Manufacturing Wholesale
and retail
trade
Female 0.0 0.3 0.0
Age 0.0 0.1 0.3
Years of schooling 0.3 0.9 0.9
Proportion of earner 0.1 0.1 0.1
Proportion of member accessing internet 0.7 1.1 0.7
Size of owned land 2.7 0.0 0.3
Size of rental land 0.7 0.0 0.0
Financial assets 13.2 32.0 6.0
Debt for house and land 0.1 0.8 0.5
Debt for business activity 0.4 0.1 16.7
Debt for agricultural activity 4.0 0.0 0.0
Residual 77.8 64.5 74.5
Total 100.0 100.0 100.0

For the trade subgroup, the accessibility to credit is more important. Of the
within-group inequality, it constitutes 17% approximately. Financial asset also
remains the important source although, unlike the manufacturing, it make up
about 6% of the within-group inequality. This findings could suggest that house-
hold doing the trade activity tend to rely on stable cash flow to run their business.
The household that have more accessibility to credit can adjust their commodi-
ties both quantity and type of products as their market situation and consumer
behaviour changes. Similarly to agriculture and manufacturing, the contribution
of education inequality to the income inequality is not large, implying that earners
of household working in this subgroup achieving similar level of education.

4 Conclusion and Policy Implication


This study investigates the drivers of household income inequality in Thailand
using decomposition approaches in three dimensions: sources of income, indus-
trial subgroups, and household characteristics. The data are from the socio-
economic survey conducted by the national statistic office in 2015. In term of
household income sources, the results showed that households doing business are
experiencing the largest income disparity, following by employment and farming.
The results further indicate that an inequality in education, while remaining an
important affecting the household income inequality as a whole, its contribution
to the income inequality is less than that of financial assets and credit accessi-
232 N. Kingnetr et al.

bility. In addition, internet exposure is also another key factor that should have
received great attention in order to mitigate the income inequality.
Furthermore, we discover that the key contributors of income inequality seem
to be heterogeneous across selected industrial subgroups. In all three sectors,
wealth and credit accessibility have the highest contribution. However, while
the inequality in households’ owned financial assets contribute highest for the
income inequality in the manufacturing and agricultural sectors, the inequality
in debt for business activity contribute more in the wholesale and retail trade
sector. In addition to financial assets and debt, the inequality in land ownership
are more important in the agricultural sector. These findings confirm the fact
that there is no one-size-fit-all policy. To effectively reduce income inequality,
each policy must be carefully designed as well as prioritised for specific industry.
Regarding the effects of education on income inequality, our analysis lead to
a striking conclusion. Education contributes much less than households’ wealth
and credit accessibility. This can be a concern as mitigating unequal opportu-
nity for education has been a key attempt to improve overall household income
inequality as many organizations have pointed out. Years of schooling or tra-
ditional education has low contribution to income inequality, especially in the
agricultural sector. It has been widely discussed in the literature that human and
physical capitals are at times complements in the production process. This could
possibly imply an incoherence between the content of the education provided and
accessibility of other relevant resources to yield a productive outcome. For edu-
cated farmers to achieve a higher income, and thus creating income inequality,
the sector requires a certain level of accessibility of capital and technological
transfer.
According to our findings, the policy implications are as follows. First of all,
distribution of wealth in many forms affects distribution of income. Therefore,
taxation on wealth such as land, real estate, and financial assets, to redistribute
wealth will play a central role in mitigating income inequality. However, a great
attention must be put in a design of this measure as it has a tendency to also
hurt the low and middle income population in various aspects. If the effects on
low and middle income population outweigh the rich, then the level of inequality
can also increase. Secondly, internet accessibility contributes to income inequality
not less than education. Consequently, nation-wide internet access is necessary
to reduce income inequality across regions. Having an access to internet is not
only allow a person to experience a vast of knowledge, which may help his/her
current income-generating activities but also business and job opportunities in
the future. Thirdly, Pico-and-nano financial providers or peer-to-peer loan ser-
vices should be developed, this will increase an opportunity for those who lack
of fund but full of creativity and innovation to run their businesses.
Finally, it should be emphasized that, in order to successfully mitigate the
income inequality, it is essential for policies to be imposed coherently and comple-
mentarily to each other. Focusing on a single policy at the time would probably
not leading to satisfactory development as a lift in income is not simply a result
of development in one driver but rather an accumulation of improvement in
Thailand’s Household Income Inequality Revisited 233

related drivers of income. In particular, for both the policies to improve human
and physical capitals to be most effective, the policies must be provided coher-
ently. As an example, consider the case for loan accessibility. It is difficult to
ensure loan accessibility with decent interest rates to all when some parts of the
population are not yet financially literate and equipped with enough knowledge
to search for investment opportunities. The risks are high and the returns, both
in term of private and social benefits, will not match the risk causing relevant
policies to be unsustainable.
An issue to be noted about the measurement of income inequality is that
a large portion of the inequality is from the top percentiles. Most inequality
measurement relies on household surveys and it is possible that the top per-
centiles are usually under-represented. For the case of Thailand, [26] used both
the SES, which is a household survey, and the tax return data to calculate Gini
coefficient in 2007 and 2009. The results show that the Gini coefficient decreased
over the years when calculated using the household survey, but increased when
calculated using the tax return data. However, the poor and the middle class are
relatively well-represented in household data and, thus, the inequality decom-
position results of this research are applicable for lower income population. In
addition, there might exist factors, such as ability of household members, that
affect both income and financial asset. Inadequacy of control for such factors can
cause an upward bias for the estimate of the impact of financial asset.
Since this study employ only one time period, thus the analysis on the dynam-
ics of income inequality decomposition in Thailand has yet remained to be seen.
By comparison of different decomposition structures overtime, this would fur-
ther clarify our understanding on income inequality development in Thailand.
Another point to be considered is that this study could only explain approxi-
mately 25 to 30% of the income inequality. This suggests that the model should
be enriched by more household characteristics in the future research.

References
1. Alvaredo, F., Chancel, L., Piketty, T., Saez, E., Zucman, G.: Global inequality
dynamics: new findings from WID.world. Working Paper Series (23119) (2017)
2. Brewer, M., Wren-Lewis, L.: Accounting for changes in income inequality: decom-
position analyses for the UK 1978–2008. Oxford Bull. Econ. Stat. 78(3), 289–322
(2016)
3. Buttrick, N.R., Oishi, S.: The psychological consequences of income inequality. Soc.
Pers. Psychol. Compass 11(3), e12304 (2017)
4. Cingano, F.: Trends in income inequality and its impact on economic growth.
OECD Social, Employment and Migration Working Papers (163) (2014)
5. Cowell, F.A.: Generalized entropy and the measurement of distributional change.
Eur. Econ. Rev. 13(1), 147–159 (1980)
6. Cowell, F.A., Fiorio, C.V.: Inequality decompositions-a reconciliation. J. Econ.
Inequality 9(4), 509–528 (2011)
7. Dabla-Norris, E., Kochhar, K., Suphaphiphat, N., Ricka, F., Tsounta, E.: Causes
and consequences of income inequality: a global perspective. IMF Staff Discussion
Notes 15/13, International Monetary Fund (2015)
234 N. Kingnetr et al.

8. Dodlova, M., Gioblas, A.: Regime type, inequality, and redistributive transfers in
developing countries. WIDER Working Paper 2017/30 (2017)
9. Elgar, F.J., Gariépy, G., Torsheim, T., Currie, C.: Early-life income inequality and
adolescent health and well-being. Soc. Sci. Med. 174, 197–208 (2017)
10. Fields, G.S.: Accounting for income inequality and its change: a new method, with
application to the distribution of earnings in the United States. Res. Labor Econ.
22, 1–38 (2003)
11. Glomm, G., Ravikumar, B.: Public versus private investment in human capital:
endogenous growth and income inequality. J. Polit. Econ. 100(4), 818–834 (1992)
12. Haughton, J.H., Khandker, S.R.: Handbook on Poverty and Inequality. World
Bank, Washington, DC (2009)
13. Kilenthong, W.: Finance and inequality in Thailand. Thammasat Econ. J. 34(3),
60–95 (2016)
14. Lakner, C., Milanovic, B.: Global income distribution: from the fall of the Berlin
wall to the great recession. World Bank Econ. Rev. 30(2), 203–232 (2016)
15. Lyubimov, I.: Income inequality revisited 60 years later: Piketty vs Kuznets. Russ.
J. Econ. 3(1), 42–53 (2017)
16. Meneejuk, P., Yamaka, W.: Analyzing the relationship between income inequal-
ity and economic growth: Does the Kuznets curve exist in Thailand? Bank
of Thailand Setthatat Paper (2016). https://www.bot.or.th/Thai/Segmentation/
Student/setthatat/DocLib Settha Paper 2559/M Doc Prize2 2559.pdf
17. Pawasutipaisit, A., Townsend, R.M.: Wealth accumulation and factors accounting
for success. J. Econ. 161(1), 56–81 (2011)
18. Paweenawat, S.W., McNown, R.: The determinants of income inequality in
Thailand: a synthetic cohort analysis. J. Asian Econ. 31–32, 10–21 (2014)
19. Piketty, T.: Capital in the Twenty-First Century. Harvard University Press,
Cambridge (2014)
20. Ravallion, M.: Income inequality in the developing world. Science 344(6186), 851–
855 (2014)
21. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3),
379–423 (1948)
22. Shorrocks, A.F.: Inequality decomposition by factor components. Econometrica
50(1), 193–211 (1982)
23. Shorrocks, A.F.: Inequality decomposition by population subgroups. Econometrica
52(6), 1369–1385 (1984)
24. Stephen, P.J.: Analysis of income distributions. Stata Tech. Bull. 8(48) (1999)
25. Theil, H.: Economics and Information Theory. Studies in Mathematical and Man-
agerial Economics. North-Holland Pub. Co., Amsterdam (1967)
26. Vanitcharearnthum, V.: Top income shares and inequality: evidences from
Thailand. Kasetsart J. Soc. Sci. (2017, in press)
Simultaneous Confidence Intervals for All
Differences of Variances of Log-Normal
Distributions

Warisa Thangjai and Suparat Niwitpong(B)

Faculty of Applied Science, Department of Applied Statistics, King Mongkut’s


University of Technology North Bangkok, Bangkok, Thailand
wthangjai@yahoo.com, suparat.n@sci.kmutnb.ac.th

Abstract. In this paper, simultaneous confidence intervals for all dif-


ferences of variances of log-normal distributions are proposed. Our
approaches are based on generalized confidence interval (GCI) app-
roach and simulation-based approach. Simulation studies show that the
GCI approach has satisfactory performances for all cases. However, the
simulation-based approach is recommended for all cases of same stan-
dard deviations otherwise the GCI approach is recommended. Finally, a
numerical example is given to illustrate the advantages of the proposed
approaches.

Keywords: Simultaneous confidence intervals · Variance


Log-normal distribution · GCI approach · Simulation-based approach

1 Introduction
Log-normal distribution has been widely used in medicine, biology, economics,
and several other fields. This distribution has a right-skewed distribution and
is used to describe the positive data. For example, in bioequivalence studies for
comparing a test drug to a reference drug; see Hannig et al. [2]. In biological sys-
tems for studying the mass of cultures or areas of plant leaves in early stages of
growth, gene expression and metabolite contents; see Schaarschmidt [10]. More-
over, in survival analysis for analyses the survival times of breast and ovarian
cancer patients; see Royston [8].
Variance is the most commonly used dispersion measures in statistics and
many fields. The variance is measured in the square units of the respective
variable. In several experimental treatments, multiple comparisons among these
treatments are common. Therefore, comparing the variance has an important
advantage. For two independent variances, confidence interval estimation for the
difference between two variances has been proposed. For details, recent work
includes Herbert et al. [4], Cojbasica and Tomovica [1], Niwitpong [5], and
Niwitpong [6]. For k independent variances, constructing simultaneous confi-
dence intervals for pairwise differences of variances are of interest. To our knowl-
edge, there is no previous work on constructing simultaneous confidence intervals
for all differences of variances of log-normal distributions.
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 235–244, 2019.
https://doi.org/10.1007/978-3-030-04263-9_18
236 W. Thangjai and S. Niwitpong

In this paper, the construction of simultaneous confidence intervals for all


differences of variances of log-normal distributions is proposed with the GCI
approach and the simulation-based approach. The GCI approach uses general-
ized pivotal quantity (GPQ) for parameter. The concept of the GPQ is explained
by Weerahandi [15]. Thangjai et al. [12] proposed the simultaneous confidence
intervals for all differences of means of normal distributions with unknown coef-
ficients of variation based on the GCI approach and the method of variance esti-
mates recovery (MOVER) approach. Moreover, Thangjai et al. [13] presented the
simultaneous confidence intervals for all differences of means of two-parameter
exponential distributions using the GCI approach, the MOVER approach, and
the parametric bootstrap approach. The simulation-based approach is proposed
by Pal et al. [7]. This approach uses the maximum likelihood estimates (MLEs)
for simulation and numerical computations. Therefore, the simulation-based app-
roach is applied to construct simultaneous confidence intervals for all differ-
ences of variances of log-normal distributions and then compare with the GCI
approach.
The rest of the article is organized as follows. In Sect. 2, the procedure for
constructing the simultaneous confidence intervals for all differences of variances
of log-normal distributions is proposed. In Sect. 3, the performance of the two
approaches using simulation studies is presented. In Sect. 4, the data is used
to illustrate our approaches based on the GCI approach and simulation-based
approach. In Sect. 5, concluding remarks are presented.

2 Simultaneous Confidence Intervals


For one sample case, let Y = (Y1 , Y2 , . . . , Yn ) be an independent log-normal
random variables and let X = log(Y ) be an independent random variables of
N (μ, σ 2 ). The normal mean and normal variance are μ and σ 2 , respectively.
The log-normal mean and log-normal variance are equal to
 
σ2
μY = exp μ + (1)
2
and       
θ = σY2 = exp 2μ + σ 2 · exp σ 2 − 1 . (2)

n 
n
Following Shen [11], suppose that X̄ = Xj /n and SX
2
= (Xj − X̄)2
j=1 j=1
are jointly sufficient and complete for μ and σ 2 , respectively. The maximum
likelihood estimators of μY and σY2 are given by
 
1 2
μ̃Y = exp X̄ + SX (3)
2n
and       
1 2 1 2
θ̃ = σ̃Y2 = exp 2X̄ + SX · exp S −1 . (4)
n n X
SCIs for All Differences of Variances of Log-Normal Distributions 237

The adjusted maximum likelihood estimators for μY and σY2 are defined as
  
   1 2
μ̂Y = exp X̄ · f SX (5)
2n

and     
   2 2 n−2
σ̂Y2 = exp 2X̄ · f SX − f SX
2
, (6)
n n (n − 1)
  2  2

n−1 t (n − 1) t3
where f (t) = 1 + t + · + · + . . ..
n+1 2! (n + 1) (n + 3) 3!
Since σ̂Y is unbiased estimator for σY . The asymptotic variance of σ̂Y2 is
2 2

V ar(θ̂) = V ar(σ̂Y2 )
2σ 2   
≈ · exp 4μ + 2σ 2
n    2    2

· 2 exp σ 2 − 1 + σ 2 2 exp σ 2 − 1 . (7)

For k sample cases, let Yi = (Yi1 , Yi2 , . . . , Yini ) be a random variable based
on the i-th log-normal sample. Therefore, the log-normal variance based on the
i-th sample is       
θi = exp 2μi + σi2 · exp σi2 − 1 . (8)
Moreover, the maximum likelihood estimator of θi is
      
1 2 1 2
θ̃i = exp 2X̄i + SX · exp S − 1 . (9)
ni i ni Xi

For i, l = 1, 2, . . . , k and i = l, the differences of log-normal variances are


equal to

θil = θi − θl
      
= exp 2μi + σi2 · exp σi2 − 1
      
− exp 2μl + σl2 · exp σl2 − 1 . (10)

The estimator of θil is

θ̃il = θ̃i − θ̃l


      
1 2 1 2
= exp 2X̄i + SX · exp S − 1
ni i ni Xi
      
1 2 1 2
− exp 2X̄l + SX · exp S − 1 . (11)
nl l nl Xl
238 W. Thangjai and S. Niwitpong

2.1 Generalized Confidence Interval Approach


Definition 2.1. Let X = (X1 , X2 , . . . , Xn ) be a random variable from a dis-
tribution F (x|δ), where x is an observed value of X, δ = (θ, ν) is a vector of
unknown parameters, θ is a parameter of interest, and ν is a vector of nui-
sance parameters. Weerahandi [15] defines random quantity R(X; x, δ) which is
called be generalized pivotal quantity (GPQ) if the following two conditions are
satisfied:
(i) For a fixed x, the distribution of R(X; x, δ) is free of all unknown parameters.
(ii) The value of R(X; x, δ) at X = x is the parameter of interest.
In line with this general definition, the 100(1 − α)% two-sided gen-
eralized confidence interval of parameter of interest is defined by
(R(α/2), R(1 − α/2)), where R(α/2) and R(1 − α/2) denote the (α/2)-th
and the (1 − α/2)-th quantiles of R(X; x, δ), respectively.

ni
Let X̄i and Si2 = (Xij − X̄i )2 /(ni − 1) be the sample mean and sample
j=1
variance, respectively. It is noted that X̄i and Si2 are mutually independent with
 
σ2 (ni − 1) Si2
X̄i ∼ N μi , i , Vi = ∼ χ2ni −1 , (12)
ni σi2

where χ2ni −1 denotes the chi-square distribution with ni − 1 degrees of freedom.


According to Tian and Wu [14], the generalized pivotal quantities of μi and
σi2 based on the i-th sample are given by

Zi (ni − 1) s2i
Rμi = x̄i − √ (13)
Ui ni

and
(ni − 1) s2i
Rσi2 = , (14)
Vi
where Zi denotes standard normal distribution and Ui and Vi denote chi-square
distribution with ni − 1 degrees of freedom.
The generalized pivotal quantity of θi is

Rθi = exp 2Rμi + Rσi2 · exp Rσi2 − 1 . (15)

For i, l = 1, 2, . . . , k and i = l, the generalized pivotal quantity of θil = θi − θl


is

Rθil = Rθi − Rθl


= exp 2Rμi + Rσi2 · exp Rσi2 − 1


− exp 2Rμl + Rσl2 · exp Rσl2 − 1 . (16)


SCIs for All Differences of Variances of Log-Normal Distributions 239

Therefore, the 100(1 − α)% two-sided simultaneous confidence intervals for


θi − θl , i, l = 1, 2, . . . , k, i = l based on GCI approach are defined by

SCIil(GCI) = (Rθil (α/2) , Rθil (1 − α/2)) , (17)

where Rθil (α/2) and Rθil (1 − α/2) denote the (α/2)-th and (1 − α/2)-th quan-
tiles of Rθil , respectively.
The simultaneous confidence intervals based on GCI approach can be con-
structed using Monte Carlo procedure as follows:
Algorithm 1
Step 1 Calculate the values of x̄i and s2i .
Step 2 Generate the values of Zi from standard normal distribution.
Step 3 Generate the values of Ui from chi-square distribution with ni −1 degrees
of freedom.
Step 4 Compute the values of Rμi by using formula (13).
Step 5 Generate the values of Vi from chi-square distribution with ni −1 degrees
of freedom.
Step 6 Compute the values of Rσi2 by using formula (14).
Step 7 Compute the values of Rθi by using formula (15).
Step 8 Compute the values of Rθil by using formula (16).
Step 9 Repeat Step 2 – Step 8 m = 1000 times.
Step 10 Compute the (α/2)-th and (1 − α/2)-th quantiles of Rθil .

2.2 Simulation-Based Approach


Again, let X = log(Y ) be random variable from normal distribution with mean

n 
n
μ and variance σ 2 . Let X̄ = Xj /n and SX2
= (Xj − X̄)2 be jointly
j=1 j=1
sufficient and complete for the mean μ and the variance σ 2 , respectively. Let
θ = (exp(2μ+σ 2 ))·(exp(σ 2 )−1) be log-normal variance. Therefore, the restricted
maximum likelihood estimators of μ, σ 2 and θ are obtained by

μ̃RM L = X̄ (18)

σ̃RM
2
L = SX
2
(19)
and       
1 2 1 2
θ̃RM L = exp 2X̄ + SX · exp S −1 . (20)
n n X
For i = 1, 2, . . . , k, the restricted maximum likelihood estimators of μi , σi2
and θi are obtained by
μ̃i(RM L) = X̄i (21)
σ̃i(RM
2
L) = SXi
2
(22)
and       
1 2 1 2
θ̃i(RM L) = exp 2X̄i + SXi · exp S −1 . (23)
ni ni Xi
240 W. Thangjai and S. Niwitpong

Let Xi(RM L) = (Xi1(RM L) , Xi2(RM L) , . . . , Xini (RM L) ) be simulated sample


from normal distribution with the mean μ̃i(RM L) in formula (21) and the variance
σ̃i(RM
2
L) in formula (22). Let X̄i(RM L) and Si(RM L) be sample mean and sample
2

variance for normal data for the i-th simulated sample. And let SX 2
i(RM L)
=
i
n
(Xij(RM L) − X̄i(RM L) )2 .
j=1
For i, l = 1, 2, . . . , k and i = l, the difference of variance estimators based on
the simulated sample is defined by

θ̃il(RM L) = θ̃i(RM L) − θ̃l(RM L)


      
1 2 1 2
= exp 2X̄i(RM L) + S · exp S −1
ni Xi(RM L) ni Xi(RM L)
      
1 2 1 2
− exp 2X̄l(RM L) + S · exp S −1 . (24)
nl Xl(RM L) nl Xl(RM L)

Therefore, the 100(1 − α)% two-sided simultaneous confidence intervals for


θi − θl , i, l = 1, 2, . . . , k, i = l based on simulation-based approach are defined
by

SCIil(SB) = θ̃il(RM L),(α/2) , θ̃il(RM L),(1−α/2) , (25)

where θ̃il(RM L),(α/2) and θ̃il(RM L),(1−α/2) denote the (α/2)-th and the (1 − α/2)-
th quantiles of θ̃il(RM L) , respectively.
The simultaneous confidence intervals based on simulation-based approach
can be constructed using Monte Carlo procedure as follows:
Algorithm 2
Step 1 Obtain the MLE of the parameters as μ̃i , σ̃i2 , and θ̃il .
Step 2 Calculate the value of μ̃i(RM L) by using formula (21), calculate the value
of σ̃i(RM
2
L) by using formula (22), and calculate the value of θ̃i(RM L) by using
formula (23).
Step 3 Generate simulated sample Xi(RM L) = (Xi1(RM L) , Xi2(RM L) , . . . ,
Xini (RM L) ) from N (μ̃i(RM L) , σ̃i(RM
2
L) ) with m = 1000 times and recalculated
θ̃il(RM L),1 , θ̃il(RM L),2 , . . . , θ̃il(RM L),m .
Step 4 Define ordered values θ̃il(RM L),(1) ≤ θ̃il(RM L),(2) ≤ . . . ≤ θ̃il(RM L),(m) .
Step 5 Compute the (α/2)-th and (1 − α/2)-th quantiles of θ̃il(RM L) .

3 Simulation Studies

A simulation study is performed to evaluate the performances of the proposed


approaches. A simulation study based on 5000 simulation runs has been done to
compare the coverage probabilities, average lengths and standard errors of two
approaches: the GCI approach and the simulation-based approach.
The coverage probabilities of two simultaneous confidence intervals can be
obtained using Monte Carlo procedure as follows:
SCIs for All Differences of Variances of Log-Normal Distributions 241

Algorithm 3
Step 1 Generate Xi from normal distribution with mean μi and variance σi2 .
Step 2 Calculate x̄i and si (the observed values of X̄i and Si ).
Step 3 Construct the simultaneous confidence intervals based on the GCI app-
roach from Algorithm 1 and record whether or not all the values of θil = θi − θl ,
i, l = 1, 2, . . . , k, i = l are in their corresponding SCIGCI .
Step 4 Construct the simultaneous confidence intervals based on the simulation-
based approach from Algorithm 2 and record whether or not all the values of
θil = θi − θl , i, l = 1, 2, . . . , k, i = l are in their corresponding SCISB .
Step 5 Repeat Step 1 – Step 4 M = 5000 times.
Step 6 Compute the coverage probability from the fraction of times that all
θil = θi − θl , i, l = 1, 2, . . . , k, i = l are in their corresponding simultaneous
confidence intervals.

Table 1. The coverage probabilities (CP), average lengths (AL) and standard errors
(s.e.) of 95% two-sided simultaneous confidence intervals for all differences of variances
of log-normal distributions: 3 sample cases.

(n1 , n2 , n3 ) (σ1 , σ2 , σ3 ) SCIGCI SCISB


CP AL (s.e.) CP AL (s.e.)
(20,20,20) (0.05,0.05,0.05) 0.9486 0.0447 (0.0038) 0.9482 0.0314 (0.0026)
(0.05,0.10,0.15) 0.9495 0.2539 (0.0656) 0.9009 0.1793 (0.0455)
(0.15,0.15,0.15) 0.9531 0.4518 (0.0428) 0.9523 0.3104 (0.0285)
(30,30,30) (0.05,0.05,0.05) 0.9526 0.0328 (0.0023) 0.9526 0.0260 (0.0018)
(0.05,0.10,0.15) 0.9521 0.1856 (0.0468) 0.9133 0.1481 (0.0369)
(0.15,0.15,0.15) 0.9468 0.3284 (0.0251) 0.9466 0.2567 (0.0193)
(50,50,50) (0.05,0.05,0.05) 0.9499 0.0234 (0.0013) 0.9507 0.0204 (0.0011)
(0.05,0.10,0.15) 0.9501 0.1340 (0.0332) 0.9299 0.1174 (0.0289)
(0.15,0.15,0.15) 0.9524 0.2315 (0.0138) 0.9531 0.1999 (0.0116)
(100,100,100) (0.05,0.05,0.05) 0.9506 0.0156 (0.0006) 0.9519 0.0145 (0.0006)
(0.05,0.10,0.15) 0.9535 0.0892 (0.0217) 0.9411 0.0836 (0.0203)
(0.15,0.15,0.15) 0.9503 0.1524 (0.0065) 0.9502 0.1418 (0.0060)
(200,200,200) (0.05,0.05,0.05) 0.9483 0.0107 (0.0003) 0.9479 0.0103 (0.0003)
(0.05,0.10,0.15) 0.9459 0.0616 (0.0151) 0.9403 0.0597 (0.0146)
(0.15,0.15,0.15) 0.9529 0.1042 (0.0034) 0.9519 0.1005 (0.0032)

The simulation study was performed with the following values of the factors:
(1) sample cases: k = 3 and k = 5; (2) population means: μ1 = μ2 = . . . =
μk = 1; (3) population standard deviations: σ1 , σ2 , . . . , σk ; (4) sample sizes:
n1 , n2 , . . . , nk ; (5) significance level: α = 0.05. The specific combinations are
given in the following two tables.
Tables 1 and 2 report the coverage probabilities, average lengths and standard
errors of the simultaneous confidence intervals for k = 3 and 5 sample cases,
242 W. Thangjai and S. Niwitpong

respectively. From Tables 1 and 2, the coverage probabilities of the GCI approach
are close to the nominal confidence level 0.95 for all cases. For ni ≤ 100, the
coverage probabilities of the simulation-based approach are close to the nominal
confidence level 0.95 when the standard deviations are same values, whereas the
coverage probabilities underestimate the nominal confidence level 0.95 when the
standard deviations are different values. For ni > 100, the coverage probabilities
of the simulation-based approach are close to the nominal confidence level 0.95.
Comparing the average lengths and standard errors, it is seen that the average
lengths and standard errors of the simulation-based approach are smaller than
those of the GCI approach for all cases.

4 An Empirical Application

Consider the data are taken from Hand et al. [3], Schaarscmidt [10] and Sadooghi-
Alvandi and Malekzadeh [9]. The data set consists of 57 observations of nitrogen
bound bovine serum albumin in k = 3 groups of mice. The data were categorized
into three groups depending on the type of mice: normal mice (Group 1), alloxan-
induced diabetic mice (Group 2), and alloxan-induced diabetic mice treated with
insulin (Group 3). The summary statistics are as follows: n1 = 20, n2 = 18,
n3 = 19, x̄1 = 4.859, x̄2 = 4.867, x̄3 = 4.397, s21 = 0.927, s22 = 0.850, s23 = 0.696,
s2X1 = 17.613, s2X2 = 14.45, s2X3 = 12.528, θ1 = 56612.670, θ2 = 46406.680, and
θ3 = 11904.000.

Table 2. The coverage probabilities (CP), average lengths (AL) and standard errors
(s.e.) of 95% two-sided simultaneous confidence intervals for all differences of variances
of log-normal distributions: 5 sample cases.

(n1 , n2 , n3 , n4 , n5 ) (σ1 , σ2 , σ3 , σ4 , σ5 ) SCIGCI SCISB


CP AL (s.e.) CP AL (s.e.)
(20,20,20,20,20) (0.05,0.05,0.05,0.05,0.05) 0.9494 0.0447 (0.0026) 0.9495 0.0314 (0.0018)
(0.05,0.05,0.10,0.15,0.15) 0.9503 0.2617 (0.0434) 0.9052 0.1842 (0.0300)
(0.15,0.15,0.15,0.15,0.15) 0.9508 0.4528 (0.0290) 0.9490 0.3108 (0.0193)
(30,30,30,30,30) (0.05,0.05,0.05,0.05,0.05) 0.9494 0.0328 (0.0015) 0.9488 0.0260 (0.0012)
(0.05,0.05,0.10,0.15,0.15) 0.9510 0.1922 (0.0305) 0.9230 0.1530 (0.0240)
(0.15,0.15,0.15,0.15,0.15) 0.9518 0.3273 (0.0168) 0.9513 0.2559 (0.0129)
(50,50,50,50,50) (0.05,0.05,0.05,0.05,0.05) 0.9518 0.0234 (0.0009) 0.9512 0.0204 (0.0007)
(0.05,0.05,0.10,0.15,0.15) 0.9516 0.1370 (0.0208) 0.9334 0.1196 (0.0181)
(0.15,0.15,0.15,0.15,0.15) 0.9489 0.2316 (0.0093) 0.9489 0.2001 (0.0079)
(100,100,100,100,100) (0.05,0.05,0.05,0.05,0.05) 0.9497 0.0156 (0.0004) 0.9500 0.0145 (0.0004)
(0.05,0.05,0.10,0.15,0.15) 0.9480 0.0918 (0.0136) 0.9394 0.0859 (0.0127)
(0.15,0.15,0.15,0.15,0.15) 0.9507 0.1526 (0.0044) 0.9508 0.1419 (0.0040)
(200,200,200,200,200) (0.05,0.05,0.05,0.05,0.05) 0.9480 0.0107 (0.0002) 0.9489 0.0103 (0.0002)
(0.05,0.05,0.10,0.15,0.15) 0.9478 0.0632 (0.0092) 0.9435 0.0611 (0.0089)
(0.15,0.15,0.15,0.15,0.15) 0.9513 0.1044 (0.0022) 0.9512 0.1007 (0.0021)
SCIs for All Differences of Variances of Log-Normal Distributions 243

The 95% two-sided simultaneous confidence intervals for the differences of


variances are given in Table 3. It is clear from the table that the simulation-
based approach is shorter interval than the GCI approach.

Table 3. The 95% two-sided simultaneous confidence intervals for all pairwise differ-
ences of variances of log-normal distributions.

Comparison CIGCI CISB


Lower Upper Lower Upper
Group 1/Group 2 –764974.00 663809.50 –241820.00 192428.80
Group 1/Group 3 –907064.70 55546.79 –272166.90 16233.17
Group 2/Group 3 –755857.60 74883.13 –231398.70 15659.84

5 Discussion and Conclusions


In this paper, the GCI approach and the simulation-based approach are intro-
duced to construct the simultaneous confidence intervals for all differences of
variances of log-normal distributions. The coverage probabilities, average lengths
and standard errors are considered. Simulation studies showed that the coverage
probabilities of the GCI approach are close to the nominal confidence level 0.95
for all cases. Coverage probabilities of the simulation-based approach are very
close to the nominal confidence level 0.95 for all cases when they have same
values of standard deviations. For different standard deviations, the simulation-
based approach has coverage probabilities below the nominal confidence level
0.95. Average lengths of the simulation-based approach are slightly less than the
GCI approach for all cases of same values of standard deviations. Hence, the
simulation-based approach is recommended for same values of standard devia-
tions otherwise the GCI approach is recommended.

Acknowledgements. This research was funded by King Mongkut’s University of


Technology North Bangkok. Contract no. KMUTNB-61-DRIVE-006.

References
1. Cojbasica, V., Tomovica, A.: Nonparametric confidence intervals for population
variance of one sample and the difference of variances of two samples. Comput.
Stat. Data Anal. 51, 5562–5578 (2007)
2. Hannig, J., Lidong, E., Abdel-Karim, A., Iyer, H.: Simultaneous fiducial generalized
confidence intervals for ratios of means of lognormal distributions. Austrian J. Stat.
35, 261–269 (2006)
3. Hand, D.J., Daly, F., McConway, K., Lunn, D., Ostrowski, E.: A Handbook of
Small Data Sets. Chapman and Hall/CRC, London (1994)
244 W. Thangjai and S. Niwitpong

4. Herbert, R.D., Hayen, A., Macaskill, P., Walter, S.D.: Interval estimation for the
difference of two independent variances. Commun. Stat. Simul. Comput. 40, 744–
758 (2011)
5. Niwitpong, S.: Confidence intervals for the difference of two normal population
variances. World Acad. Sci. Eng. Technol. 80, 602–605 (2011)
6. Niwitpong, S.: A note on coverage probability of confidence interval for the differ-
ence between two normal variances. Appl. Math. Sci. 6, 3313–3320 (2012)
7. Pal, N., Lim, W.K., Ling, C.H.: A computational approach to statistical inferences.
J. Appl. Probab. Stat. 2, 13–35 (2007)
8. Royston, P.: The lognormal distribution as a model for survival time in cancer,
with an emphasis on prognostic factors. Statistica Neerlandica 55, 89–104 (2001)
9. Sadooghi-Alvandi, S.M., Malekzadeh, A.: Simultaneous confidence intervals for
ratios of means of several lognormal distributions: a parametric bootstrap app-
roach. Comput. Stat. Data Anal. 69, 133–140 (2014)
10. Schaarschmidt, F.: Simultaneous confidence intervals for multiple comparisons
among expected values of log-normal variables. Comput. Stat. Data Anal. 58,
265–275 (2013)
11. Shen, W.H.: Estimation of parameters of a lognormal distribution. Taiwan. J.
Math. 2, 243–250 (1998)
12. Thangjai, W., Niwitpong, S., Niwitpong, S.: Simultaneous confidence intervals for
all differences of means of normal distributions with unknown coefficients of vari-
ation. In: Studies in Computational Intelligence, vol. 753, pp. 670–682 (2018)
13. Thangjai, W., Niwitpong, S., Niwitpong, S.: Simultaneous confidence intervals for
all differences of means of two-parameter exponential distributions. In: Studies in
Computational Intelligence, vol. 760, pp. 298–308 (2018)
14. Tian, L., Wu, J.: Inferences on the common mean of several log-normal populations:
the generalized variable approach. Biom. J. 49, 944–951 (2007)
15. Weerahandi, S.: Generalized confidence intervals. J. Am. Stat. Assoc. 88, 899–905
(1993)
Confidence Intervals for the Inverse Mean
and Difference of Inverse Means
of Normal Distributions with Unknown
Coefficients of Variation

Warisa Thangjai(B) , Sa-Aat Niwitpong, and Suparat Niwitpong

Faculty of Applied Science, Department of Applied Statistics,


King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
wthangjai@yahoo.com, {sa-aat.n,suparat.n}@sci.kmutnb.ac.th

Abstract. This paper investigates the performance of the confidence


intervals for a single inverse mean and the difference of two inverse means
of normal distributions with unknown coefficients of variation (CVs).
The confidence intervals for the inverse mean with unknown coefficient
of variation (CV) were constructed based on the generalized confidence
interval (GCI) approach and the large sample approach. The generalized
confidence interval and large sample confidence interval for the inverse
mean with unknown CV were compared with the generalized confidence
interval for the inverse mean of Niwitpong and Wongkhao [5]. Moreover,
the confidence intervals for the difference of inverse means with unknown
CVs were constructed using the GCI approach, large sample approach
and method of variance estimates recovery (MOVER) approach and then
compared with existing confidence interval for the difference of inverse
means based on the GCI approach of Niwitpong and Wongkhao [6]. The
coverage probability and average length of the confidence intervals were
evaluated by a Monte Carlo simulation. Carrying out the simulation stud-
ies, the results showed that the generalized confidence interval provides
the best confidence interval for the inverse mean with unknown CV. The
generalized confidence interval and the MOVER confidence interval for
the difference of inverse means with unknown CVs perform well in terms
of the coverage probability and average length. Finally, two real datasets
are analyzed to illustrate the proposed confidence intervals.

Keywords: GCI approach · MOVER approach


Monte Carlo simulation

1 Introduction
In statistics, a normal distribution is the most important probability distribution.
The mean and variance of the normal population are denoted μ and σ 2 , respec-
tively. The sample mean x̄ is the uniformly minimum variance unbiased (UMVU)
estimator of the normal population mean μ. Searls [9] introduced the minimum
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 245–263, 2019.
https://doi.org/10.1007/978-3-030-04263-9_19
246 W. Thangjai et al.

mean squared error (MMSE) estimator for the normal population mean with
known CV, where the CV is defined as σ/μ. However, the CV needs to be esti-
mated in practice. Therefore, Srivastava [10] proposed the UMVU estimator to
estimate the normal population mean with unknown CV. For more details about
the mean of normal distribution with unknown CV, see the research papers of
Sahai [7], Sahai and Acharya [8], Sodanin et al. [11], and Thangjai et al. [13].
The inverse mean is the reciprocal of mean, 1/μ. It has been used in experi-
mental nuclear physics, econometrics, and biological sciences. Several researchers
have been studied confidence interval estimation for the inverse mean of normal
distribution. For example, Niwitpong and Wongkhao [5] constructed the new
confidence intervals for the inverse mean. Niwitpong and Wongkhao [6] pro-
posed the new confidence intervals for the difference of inverse means. Thangjai
et al. [12] investigated the performance of the confidence intervals for the com-
mon inverse mean based on the GCI approach and the large sample approach.
Thangjai et al. [14] extended the research work of Thangjai et al. [12]. Thangjai
et al. [14] proposed the adjusted MOVER approach to construct the confidence
interval for the common inverse mean.
Suppose X = (X1 , X2 , . . . , Xn ) is a random sample of size n from all possible
distributions. Let L(X) and U (X) be the lower limit and the upper limit of the
confidence interval for the mean corresponding to a given nominal confidence
level 1 − α. By definition, this means that if X = (X1 , X2 , . . . , Xn ) is an inde-
pendent and identically distributed (i.i.d.) sample from the actual distribution,
then the actual mean M will be between L(X) and U (X). It can be written as
P (L(X) ≤ M ≤ U (X)) = 1 − α. By taking inverse values of all three values, it
concludes that 1/U (X) ≤ 1/M ≤ 1/L(X). This means that if a random sample
is taken, then 1/M will be between 1/U (X) and 1/L(X) with nominal confidence
level 1 − α. That is P (1/U (X) ≤ 1/M ≤ 1/L(X)) = 1 − α. In other words, if
(L(X), U (X)) is the confidence interval for the mean, then (1/U (X), 1/L(X)) is
automatically a confidence interval for the inverse mean.
To our knowledge, no paper exists for the inverse mean and difference of
inverse means of normal distributions with unknown CVs. Therefore, this paper
will fill this gap by developing novel approaches and extends the paper works
of Thangjai et al. [12], Thangjai et al. [13] and Thangjai et al. [14] to construct
confidence intervals for the single inverse mean and difference of inverse means
of normal distributions with unknown CVs. The confidence intervals for the
single inverse mean with unknown CV were constructed based on the general-
ized confidence interval and the large sample confidence interval and compared
with the generalized confidence interval for the inverse mean of Niwitpong and
Wongkhao [5]. Moreover, the confidence intervals for the difference of inverse
means with unknown CVs were proposed using the generalized confidence inter-
val, large sample confidence interval, and method of variance estimates recovery
(MOVER) confidence interval. Three confidence intervals were compared with
the generalized confidence interval for the difference of inverse means of Niwit-
pong and Wongkhao [6].
CIs for Inverse Mean and Difference of Inverse Means 247

This paper is organized as follows. In Sect. 2, the confidence intervals for the
single inverse mean with unknown CV are described. In Sect. 3, the confidence
intervals for the difference of inverse means with unknown CVs are provided. In
Sect. 4, simulation results are presented to evaluate the coverage probabilities and
average lengths of the proposed approaches. Section 5 illustrates the proposed
approaches using two examples. Finally, Sect. 6 summarizes this paper.

2 Confidence Intervals for the Inverse Mean of Normal


Distribution with Unknown Coefficient of Variation

Let X = (X1 , X2 , . . . , Xn ) be a random sample from a normal distribution with


mean μ and variance σ 2 . The CV is defined as the standard deviation divided
by the mean, τ = σ/μ. Let X̄ and S 2 be sample mean and sample variance for
X, respectively. Hence, the estimator of the CV is τ̂ = σ̂μ̂ = X̄
S
. Also, let x̄ and
2 2
s be the observed sample of X̄ and S , respectively.
Searls [9] proposed the following minimum mean squared error (MMSE) esti-
mator for the mean of normal population with variance
μ nμ
η= = . (1)
1 + (σ 2 /nμ2 ) n + (σ 2 /μ2 )

The estimator of the mean of normal population with unknown CV was


introduced by Srivastava [10]. The estimator of Srivastava [10] has the following
form
X̄ nX̄
η̂ =  =  . (2)
2
1 + S /nX̄ 2 n + S 2 /X̄ 2
From Eqs. (1) and (2), the inverse mean of normal population with unknown
CV θ and the estimator of θ are
   
1 1 + σ 2 /nμ2 n + σ 2 /μ2
θ= = = (3)
η μ nμ
and    
1 1 + S 2 /nX̄ 2 n + S 2 /X̄ 2
θ̂ = = = . (4)
η̂ X̄ nX̄
Theorem 2.1. Let X = (X1 , X2 , . . . , Xn ) be a random sample from N (μ, σ 2 ).
Let X̄ and S 2 be a sample mean and a sample variance of X, respectively. Let
θ be the inverse mean of normal population with unknown CV and let θ̂ be an
estimator of θ. The mean and variance of θ̂ are
 
σ2 2σ 6 + 4nμ2 σ 4
E(θ̂) = 1 + + ·θ (5)
nμ2 + σ 2 (nμ2 + σ 2 )3
248 W. Thangjai et al.

and
    2
1 1 σ2 2σ 4 + 4nμ2 σ 2
V ar(θ̂) = + · · 1 +
μ μ nμ2 + σ 2 (nμ2 + σ 2 )2
⎛  2   ⎞
nσ 2 2 2σ 4 + 4nμ2 σ 2
⎜ ·
σ2 ⎟
+
⎜ nμ2 + σ 2 n (nμ2 + σ 2 )2 ⎟
· ⎜    2 + 2⎟
. (6)
⎝ nσ 2 4
2σ + 4nμ σ 2 2 nμ ⎠
n+ · 1+
nμ2 + σ 2 (nμ2 + σ 2 )2

Proof. Let θ = (n + (σ 2 /μ2 ))/nμ and θ̂ = (n + (S 2 /X̄ 2 ))/nX̄. Recall that


X̄ ∼ N (μ, σ 2 /n). Then the mean and variance of nX̄ are

E(nX̄) = nE(X̄) = nμ and V ar(nX̄) = n2 V ar(X̄) = nσ 2 .

According to Thangjai et al. [13], the mean and variance of X̄ 2 have the
following form
  nμ2 + σ 2   2σ 4 + 4nμ2 σ 2
E X̄ 2 = and V ar X̄ 2 = .
n n2
The mean and the variance of S 2 /X̄ 2 are
 2  
S nσ 2 2σ 4 + 4nμ2 σ 2
E = · 1+ 2
X̄ 2 nμ2 + σ 2 (nμ2 + σ 2 )

and    2
S2 nσ 2 2 2σ 4 + 4nμ2 σ 2
V ar = · + 2 .
X̄ 2 nμ2 + σ 2 n (nμ2 + σ 2 )
Therefore, the mean and variance of n + (S 2 /X̄ 2 ) are defined by
   
S2 nσ 2 2σ 4 + 4nμ2 σ 2
E n+ 2 =n+ · 1+ 2
X̄ nμ2 + σ 2 (nμ2 + σ 2 )

and
     2
S2 S2 nσ 2 2 2σ 4 + 4nμ2 σ 2
V ar n + 2 = V ar = · + 2 .
X̄ X̄ 2 nμ2 + σ 2 n (nμ2 + σ 2 )
CIs for Inverse Mean and Difference of Inverse Means 249

Following Blumenfeld [1], the mean and variance of θ̂ are obtained by


   
n + (S 2 /X̄ 2 )
E θ̂ = E
nX̄
    
E n + S 2 /X̄ 2 V ar nX̄
=   · 1+   2
E nX̄ E nX̄
   
σ2 2σ 4 + 4nμ2 σ 2 n2 μ2 + nσ 2
= 1+ · 1 + 2 ·
nμ2 + σ 2 (nμ2 + σ 2 ) n2 μ3
  
σ2 2σ 6 + 4nμ2 σ 4 nμ2 n + σ 2 /μ2
= 1+ + 3 ·
nμ2 + σ 2 (nμ2 + σ 2 ) n2 μ3
 2 2
σ2 2σ 6 + 4nμ2 σ 4 n + σ /μ
= 1+ 2 2
+ 3 ·
nμ + σ (nμ2 + σ 2 ) nμ

σ2 2σ 6 + 4nμ2 σ 4
= 1+ + 3 ·θ
nμ2 + σ 2 (nμ2 + σ 2 )
and
   
n + (S 2 /X̄ 2 )
V ar θ̂ = V ar
nX̄
  2 2  2     
E n + S /X̄ V ar n + S 2 /X̄ 2 V ar nX̄
=   ·    2 +   2
E nX̄ E n + S 2 /X̄ 2 E nX̄
⎛   ⎞2
nσ 2 2σ 4 + 4nμ2 σ 2
⎜ n + nμ2 + σ 2 · 1 + ⎟
⎜ (nμ2 + σ 2 )
2

=⎜⎜ ⎟
nμ ⎟
⎝ ⎠

⎛  2 ⎞
nσ 2 2 2σ 4 + 4nμ2 σ 2
⎜ · + ⎟
⎜ nμ2 + σ 2 n (nμ2 + σ 2 )
2
nσ 2 ⎟
⎜ ⎟
· ⎜   2 + ⎟
⎜ n2 μ2 ⎟
⎝ nσ 2 2σ 4 + 4nμ2 σ 2 ⎠
n+ · 1+ 2
nμ2 + σ 2 (nμ2 + σ 2 )
  2
1 1 σ2 2σ 4 + 4nμ2 σ 2
= + · · 1+ 2
μ μ nμ2 + σ 2 (nμ2 + σ 2 )
⎛  2 ⎞
nσ 2 2 2σ 4 + 4nμ2 σ 2
⎜ · + ⎟
⎜ nμ2 + σ 2 n (nμ2 + σ 2 )
2
σ2 ⎟
⎜ ⎟
· ⎜   2 + ⎟.
⎜ nμ2 ⎟
⎝ nσ 2 2σ 4 + 4nμ2 σ 2 ⎠
n+ · 1+ 2
nμ2 + σ 2 (nμ2 + σ 2 )
Hence, Theorem 2.1 is proved.
250 W. Thangjai et al.

2.1 Generalized Confidence Interval for the Inverse Mean


of Normal Distribution with Unknown Coefficient of Variation
Definition 2.1. Let X = (X1 , X2 , . . . , Xn ) be a random sample from a distri-
bution F (x|γ) which depends on a vector of parameters γ = (θ, ν) where θ is
parameter of interest and ν is possibly a vector of nuisance parameters. Weer-
ahandi [15] defines a generalized pivotal quantity R(X, x, θ, ν) for confidence
interval estimation, where x is an observed value of X, as a random variable
having the following two properties:
(i) R(X, x, θ, ν) has a probability distribution that is free of all unknown param-
eters.
(ii) The observed value of R(X, x, θ, ν), X = x, is the parameter of interest.
Let R(α) be the 100(α) -th percentile of R(X, x, θ, ν). Along these lines,
(R(α/2), R(1 − α/2)) becomes a 100(1 − α)% two-sided generalized confi-
dence interval for the parameter of interest.
Recall that
(n − 1) S 2
= V ∼ χ2n−1 , (7)
σ2
where V is chi-squared distribution with n − 1 degrees of freedom. Therefore,
the generalized pivotal quantity for σ 2 is

(n − 1) s2
Rσ 2 = . (8)
V
The mean has the following form

Z (n − 1) s2
μ ≈ x̄ − √ , (9)
U n
where Z and U denote standard normal distribution and chi-square distribution
with n − 1 degrees of freedom, respectively. Therefore, the generalized pivotal
quantity for μ is 
Z (n − 1) s2
Rμ = x̄ − √ . (10)
U n
The generalized pivotal quantity for θ is
 
n + Rσ2 /Rμ2
Rθ = , (11)
nRμ

where Rσ2 and Rμ are defined in Eqs. (8) and (10).


Therefore, the 100(1−α)% two-sided confidence interval for the inverse mean
of normal distribution with unknown CV based on the GCI approach is

CIGCI.θ = (Rθ (α/2) , Rθ (1 − α/2)) , (12)

where Rθ (α) denote the 100(α)-th percentile of Rθ .


CIs for Inverse Mean and Difference of Inverse Means 251

2.2 Large Sample Confidence Interval for the Inverse Mean of


Normal Distribution with Unknown Coefficient of Variation

Using the normal approximation, the pivotal statistic is

θ̂ − E(θ̂) θ̂ − θ
Z= = . (13)
V ar(θ̂) V ar(θ̂)

Therefore, the 100(1−α)% two-sided confidence interval for the inverse mean
of normal distribution with unknown CV based on the large sample approach is
   
CILS.θ = θ̂ − z1−α/2 V ar(θ̂), θ̂ + z1−α/2 V ar(θ̂) , (14)

where θ̂ is defined in Eq. (4), V ar(θ̂) is defined in Eq. (6) with μ and σ 2 replaced
by x̄ and s2 , respectively, and z1−α/2 denotes the (1 − α/2)-th quantile of the
standard normal distribution.

3 Confidence Intervals for the Difference of Inverse Means


of Normal Distributions with Unknown Coefficients
of Variation

Let X = (X1 , X2 , . . . , Xn ) be a random sample from a normal distribution with


2 2
mean μX and variance σX . Let X̄ and SX be sample mean and sample variance
2 2
for X, respectively. Also, let x̄ and sX be the observed sample of X̄ and SX ,
respectively. Furthermore, let Y = (Y1 , Y2 , . . . , Ym ) be a random sample from a
normal distribution with mean μY and variance σY2 . Let Ȳ and SY2 be sample
mean and sample variance for Y , respectively. Also, let ȳ and s2Y be the observed
sample of Ȳ and SY2 , respectively. Also, X and Y are independent.
Let δ = θX −θY be the difference of inverse means with unknown CVs, where
θX and θY are the inverse means with unknown CVs of X and Y , respectively.
The estimator of δ is
 2   
n + SX /X̄ 2 m + SY2 /Ȳ 2
δ̂ = θ̂X − θ̂Y = − , (15)
nX̄ mȲ

where θ̂X and θ̂Y denote the estimators of θX and θY , respectively.

Theorem 3.1. Let X = (X1 , X2 , . . . , Xn ) and Y = (Y1 , Y2 , . . . , Ym ) be a ran-


2
dom samples from N (μX , σX ) and N (μY , σY2 ), respectively. Let X and Y be
2
independent. Suppose that X̄ and SX are a sample mean and a sample vari-
ance for X, respectively. Also, suppose that Ȳ and SY2 are a sample mean and
a sample variance for Y , respectively. Let θX and θY be the inverse means with
252 W. Thangjai et al.

unknown CVs of X and Y , respectively. Let δ be the difference between θX and


θY and let δ̂ be an estimator of δ. The mean and variance of δ̂ are
 2 6
 
σX 2σX + 4nμ2X σX
4
E(δ̂) = 1+ + · θ X
nμ2X + σX2 (nμ2X + σX
2 )3
  
σY2 2σY6 + 4mμ2Y σY4
− 1+ + · θ Y (16)
mμ2Y + σY2 (mμ2Y + σY2 )3

and
  2
  4
2
1 1 σX 2σX + 4nμ2X σX 2
V ar(δ̂) = + · · 1 +
μX μX nμ2X + σX 2 (nμ2X + σX 2 )2
⎛  2   ⎞
2 4
nσX 2 2σX + 4nμ2X σX2
⎜ ·
σ2 ⎟
+
⎜ nμ2X + σX 2 n (nμ2X + σX 2 )2

· ⎜     + X2 ⎟
⎝ nσX2 4
2σ + 4nμX σX 2 2 2 nμ X⎠
n+ 2 2 · 1+ X 2 2 2
nμX + σX (nμX + σX )
  2
  2
1 1 σY 2σY4 + 4mμ2Y σY2
+ + · · 1 +
μY μY mμ2Y + σY2 (mμ2Y + σY2 )2
⎛   2   ⎞
mσY2 2 2σY4 + 4mμ2Y σY2
⎜ ·
σY2 ⎟
+
⎜ mμ2Y + σY2 m (mμ2Y + σY2 )2 ⎟
· ⎜    2 + ⎟(17)
.
⎝ mσY2 2σY4 + 4mμ2Y σY2 mμ2Y ⎠
m+ · 1+
mμ2Y + σY2 (mμ2Y + σY2 )2

Proof. Let δ = θX − θY be the difference of inverse means with unknown CVs.


Let δ̂ be an estimator of δ which is defined by
2
n + (SX /X̄ 2 ) m + (SY2 /Ȳ 2 )
δ̂ = − .
nX̄ mȲ

Thus, the mean and variance of δ̂ are obtained by


   2

n + (SX /X̄ 2 ) m + (SY2 /Ȳ 2 )
E δ̂ = E −
nX̄ mȲ
 2
  
n + (SX /X̄ 2 ) m + (SY2 /Ȳ 2 )
=E −E
nX̄ mȲ
 2 6
 
σX 2σX + 4nμ2X σX
4
= 1+ 2 + (nμ2 + σ 2 )3 · θX
nμ2X + σX X X
  
σY2 2σY6 + 4mμ2Y σY4
− 1+ + · θ Y
mμ2Y + σY2 (mμ2Y + σY2 )3
CIs for Inverse Mean and Difference of Inverse Means 253

and
   2

n + (SX /X̄ 2 ) m + (SY2 /Ȳ 2 )
V ar δ̂ = V ar −
nX̄ mȲ
 2 2
  
n + (SX /X̄ ) m + (SY2 /Ȳ 2 )
= V ar + V ar
nX̄ mȲ
  2
  4
2
1 1 σX 2σX + 4nμ2X σX 2
= + · · 1+
μX μX nμ2X + σX 2 (nμ2X + σX 2 )2
⎛   2   ⎞
2 4
nσX 2 2σX + 4nμ2X σX2
⎜ · + 2 ⎟
⎜ nμ2X + σX 2 n (nμ2X + σX 2 )2
σX ⎟
· ⎜     + ⎟
⎝ nσX2
2σX 4
+ 4nμ2X σX2 2 nμ2X ⎠
n+ · 1+
nμ2X + σX 2 (nμ2X + σX 2 )2
    2
1 1 σY2 2σY4 + 4mμ2Y σY2
+ + · · 1+
μY μY mμ2Y + σY2 (mμ2Y + σY2 )2
⎛  2   ⎞
mσY2 2 2σY4 + 4mμ2Y σY2
⎜ ·
σY2 ⎟
+
⎜ mμ2Y + σY2 m (mμ2Y + σY2 )2 ⎟
· ⎜     + 2 ⎟ .
⎝ mσY 2 4
2σY + 4mμY σY 2 2 2 mμ Y ⎠
m+ · 1+
mμ2Y + σY2 (mμ2Y + σY2 )2
Hence, Theorem 3.1 is proved.

3.1 Generalized Confidence Interval for the Difference of Inverse


Means of Normal Distributions with Unknown Coefficients
of Variation
Define
(n − 1) SX
2
(m − 1) SY2
2 = VX ∼ χ2n−1 and = VY ∼ χ2m−1 , (18)
σX σY2
where VX and VY are chi-squared distributions with n − 1 and m − 1 degrees of
2
freedom. Therefore, the generalized pivotal quantities for σX and σY2 are
(n − 1) s2X (m − 1) s2Y
R σX
2 = and RσY2 = . (19)
VX VY
The means are given by
 
ZX (n − 1) s2X ZY (m − 1) s2Y
μX ≈ x̄ − √ and μY ≈ ȳ − √ , (20)
UX n UY m
where ZX and ZY denote standard normal distributions and UX and UY denote
chi-square distributions with n − 1 and m − 1 degrees of freedom, respectively.
Therefore, the generalized pivotal quantities for μX and μY are
 
ZX (n − 1) s2X ZY (m − 1) s2Y
RμX = x̄ − √ and RμY = ȳ − √ . (21)
UX n UY m
254 W. Thangjai et al.

Therefore, the generalized pivotal quantity for δ is


   
2
n + R σX 2 /R
μX m + RσY2 /Rμ2 Y
R δ = R θX − R θY = − . (22)
nRμX mRμY

Therefore, the 100(1 − α)% two-sided confidence interval for the difference
of inverse means of normal distributions with unknown CVs based on the GCI
approach is
CIGCI.δ = (Rδ (α/2) , Rδ (1 − α/2)) , (23)
where Rδ (α) denote the 100(α)-th percentile of Rδ .

3.2 Large Sample Confidence Interval for the Difference of Inverse


Means of Normal Distributions with Unknown Coefficients of
Variation

The pivotal statistic based on the normal approximation is

δ̂ − E(δ̂) δ̂ − δ
Z= = . (24)
V ar(δ̂) V ar(δ̂)

Therefore, the 100(1 − α)% two-sided confidence interval for the difference
of inverse means of normal distributions with unknown CVs based on the large
sample approach is
   
CILS.δ = δ̂ − z1−α/2 V ar(δ̂), δ̂ + z1−α/2 V ar(δ̂) , (25)

where z1−α/2 denotes the (1−α/2)-th quantile of the standard normal distribution.

3.3 Method of Variance Estimates Recovery Confidence Interval


for the Difference of Inverse Means of Normal Distributions
with Unknown Coefficients of Variation

According to Niwitpong and Wongkhao [5], the confidence intervals for the
inverse means of X and Y are
 √ √ 
n n
(lX , uX ) = √ , √ (26)
dX SX + nX̄ −dX SX + nX̄

and  √ √ 
m m
(lY , uY ) = √ , √ , (27)
dY SY + mȲ −dY SY + mȲ
where dX and dY are an upper (1 − α/2)-th quantiles of the t-distributions with
n − 1 and m − 1 degrees of freedom, respectively.
CIs for Inverse Mean and Difference of Inverse Means 255

Donner and Zou [2] introduced the confidence interval estimation for the
difference of parameters of interest using by the MOVER approach. Let Lδ and
Uδ be the lower limit and upper limit of the confidence interval of the difference
of two parameters, respectively. The lower limit and upper limit are given by

Lδ = θ̂X − θ̂Y − (θ̂X − lX )2 + (uY − θ̂Y )2 (28)

and 
Uδ = θ̂X − θ̂Y + (uX − θ̂X )2 + (θ̂Y − lY )2 , (29)

where θ̂X and θ̂Y are defined in Eq. (15), lX and uX are defined in Eq. (26), and
lY and uY are defined in Eq. (27).
Therefore, the 100(1 − α)% two-sided confidence interval for the difference of
inverse means of normal distributions with unknown CVs based on the MOVER
approach is
CIM OV ER.δ = (Lδ , Uδ ) , (30)
where Lδ and Uδ are defined in Eqs. (28) and (29), respectively.

4 Simulation Studies
The proposed confidence intervals in Sects. 2 and 3 were compared the perfor-
mance of the confidence intervals in term of coverage probability and average
length. In this section, there are two simulation studies. First, the proposed confi-
dence intervals in Sect. 2 were compared with the generalized confidence interval
for the inverse mean of normal distribution which introduced by Niwitpong and
Wongkhao [5]. Second, the proposed confidence intervals in Sect. 3 were com-
pared with the generalized confidence interval for the difference of inverse means
of normal distributions which presented by Niwitpong and Wongkhao [6]. The
nominal confidence level was set at 1 − α = 0.95. The confidence interval was
chosen when the values of the coverage probability greater than or close to the
nominal confidence level and also having the shortest average length.
Firstly, the performances of confidence intervals for the inverse mean of nor-
mal distribution with unknown CV were compared. The generalized confidence
interval for the inverse mean with unknown CV was defined as CIGCI.θ , the
large sample confidence interval for the inverse mean with unknown CV was
defined as CILS.θ , and the generalized confidence interval for the inverse mean
of Niwitpong and Wongkhao [5] was defined as CIN W . The data were generated
from a normal distribution with the population mean μ = 1, the population
standard deviation σ = 0.01, 0.03, 0.05, 0.07, 0.09, 0.10, 0.30, 0.50 and 0.70, and
the sample size n = 20, 30, 50, 100 and 200. Table 1 shown the coverage proba-
bilities and average lengths of 95% two-sided confidence intervals for θ and 1/μ.
The results indicated that the large sample confidence intervals CILS.θ have the
coverage probabilities under nominal confidence level of 0.95 when the sample
size is small and have the coverage probabilities close to 0.95 when the sample
size is large. The coverage probabilities of the generalized confidence intervals
256 W. Thangjai et al.

CIGCI.θ are as well as the coverage probabilities of the generalized confidence


intervals of Niwitpong and Wongkhao [5] CIN W . The average lengths of CIGCI.θ
are a bit shorter than the average lengths of CIN W for σ ≤ 0.30, whereas the
average lengths of CIN W are shorter than the average lengths of CIGCI.θ for
σ > 0.30.
Secondly, the coverage probabilities and average lengths of the proposed con-
fidence intervals for the difference of inverse means with unknown CVs were
obtained and hence compared with the generalized confidence interval for the
difference of inverse means of Niwitpong and Wongkhao [6]. The generalized
confidence interval for the difference of inverse means with unknown CVs was
defined as CIGCI.δ , the large sample confidence interval for the difference of
inverse means with unknown CVs was defined as CILS.δ , the MOVER con-
fidence interval for the difference of inverse means with unknown CVs was
defined as CIM OV ER.δ , and the generalized confidence interval for the differ-
ence of inverse means of Niwitpong and Wongkhao [6] was defined as CID.N W .
Two data were generated from X ∼ N (μX , σX 2
) and Y ∼ N (μY , σY2 ), where the
population means μX = μY = 1, the population standard deviations σX = 0.10,
σY = 0.01, 0.03, 0.05, 0.07, 0.09, 0.10, 0.30, 0.50 and 0.70, and the sample sizes
(n, m) = (20, 20), (30, 30), (20, 30), (50, 50), (100, 100), (50, 100) and (200,
200). Tables 2 and 3 showed the coverage probabilities and the average lengths
of 95% two-sided confidence intervals for the difference of inverse means with
unknown CVs, respectively. The results indicated that the coverage probabili-
ties of the CIGCI.δ , CIM OV ER.δ and CID.N W close to nominal confidence level
of 0.95, whereas CILS.δ provides the coverage probabilities under 0.95. In almost
all cases, the CIGCI.δ and CIM OV ER.δ yield the average lengths shorter than
the CID.N W . Therefore, the CIGCI.δ and CIM OV ER.δ perform well in terms of
the coverage probability and average length for the confidence intervals for the
difference of inverse means with unknown CVs.

5 An Empirical Application
In this section, some real data are used to illustrate the proposed confidence
intervals.
Example 1. Consider the data taken from Niwitpong [4] and Thangjai et al. [13].
This data shows the number of defects in 100,000 lines of code in a particular
type of software program. The observations are as follows 48, 54, 50, 38, 39,
48, 48, 38, 42, 52, 42, 36, 52, 55, 40, 40, 40, 43, 43, 40, 48, 46, 48, 48, 52,
48, 50, 48, 52, 52, 46, and 45. Niwitpong [4] presented that the data is fitted
by normal distribution. The summary statistics are as follows n = 32, x̄ =
45.9688, s2 = 27.7732, and 1/x̄ = 0.0218. The proposed confidence intervals
given in Sect. 2 were used to compute the 95% two-sided confidence intervals
for the inverse mean with unknown CV. The generalized confidence interval is
CIGCI.θ = (0.0209, 0.0227) with the interval length of 0.0018. The large sample
confidence interval is CILS.θ = (–0.8687, 0.9122) with the interval length of
1.7809. In comparison, the generalized confidence interval for the inverse mean
CIs for Inverse Mean and Difference of Inverse Means 257

Table 1. The coverage probabilities (CP) and average lengths (AL) of 95% two-sided
confidence intervals for the inverse mean of normal distribution with unknown CV.

n σ CIGCI.θ CILS.θ CIN W


CP AL CP AL CP AL
20 0.01 0.9530 0.0092 0.9404 0.0086 0.9580 0.0095
0.03 0.9534 0.0278 0.9408 0.0260 0.9590 0.0285
0.05 0.9494 0.0467 0.9368 0.0437 0.9546 0.0479
0.07 0.9468 0.0648 0.9322 0.0605 0.9494 0.0664
0.09 0.9528 0.0837 0.9384 0.0781 0.9568 0.0857
0.10 0.9484 0.0930 0.9350 0.0868 0.9558 0.0953
0.30 0.9468 0.2931 0.9326 0.2653 0.9520 0.2957
0.50 0.9504 0.5417 0.9334 0.4584 0.9548 0.5266
0.70 0.9474 0.9668 0.9264 0.7041 0.9532 0.8444
30 0.01 0.9508 0.0074 0.9378 0.0071 0.9534 0.0075
0.03 0.9464 0.0222 0.9360 0.0213 0.9494 0.0226
0.05 0.9490 0.0370 0.9388 0.0354 0.9520 0.0376
0.07 0.9498 0.0519 0.9406 0.0497 0.9532 0.0528
0.09 0.9512 0.0671 0.9394 0.0641 0.9534 0.0681
0.10 0.9468 0.0742 0.9356 0.0710 0.9494 0.0755
0.30 0.9490 0.2288 0.9404 0.2149 0.9520 0.2304
0.50 0.9498 0.4074 0.9402 0.3683 0.9532 0.4019
0.70 0.9536 0.6422 0.9438 0.5462 0.9576 0.6059
50 0.01 0.9484 0.0056 0.9398 0.0055 0.9504 0.0057
0.03 0.9528 0.0170 0.9488 0.0165 0.9548 0.0171
0.05 0.9504 0.0282 0.9446 0.0275 0.9530 0.0285
0.07 0.9452 0.0397 0.9404 0.0386 0.9490 0.0400
0.09 0.9520 0.0511 0.9464 0.0498 0.9546 0.0516
0.10 0.9462 0.0567 0.9400 0.0552 0.9488 0.0572
0.30 0.9514 0.1729 0.9438 0.1665 0.9554 0.1735
0.50 0.9546 0.2994 0.9494 0.2827 0.9580 0.2972
0.70 0.9454 0.4449 0.9428 0.4086 0.9474 0.4328
100 0.01 0.9508 0.0040 0.9482 0.0039 0.9512 0.0040
0.03 0.9530 0.0119 0.9508 0.0117 0.9532 0.0119
0.05 0.9494 0.0198 0.9472 0.0195 0.9520 0.0199
0.07 0.9574 0.0277 0.9530 0.0274 0.9574 0.0278
0.09 0.9520 0.0357 0.9502 0.0352 0.9534 0.0358
0.10 0.9522 0.0396 0.9510 0.0391 0.9534 0.0397
0.30 0.9504 0.1200 0.9482 0.1178 0.9512 0.1202
0.50 0.9566 0.2033 0.9552 0.1979 0.9560 0.2028
0.70 0.9490 0.2920 0.9462 0.2810 0.9506 0.2888
200 0.01 0.9498 0.0028 0.9488 0.0028 0.9502 0.0028
0.03 0.9502 0.0084 0.9492 0.0083 0.9516 0.0084
0.05 0.9530 0.0139 0.9510 0.0138 0.9520 0.0140
0.07 0.9534 0.0195 0.9528 0.0194 0.9544 0.0196
0.09 0.9510 0.0251 0.9488 0.0249 0.9496 0.0251
0.10 0.9512 0.0279 0.9504 0.0277 0.9514 0.0280
0.30 0.9460 0.0839 0.9460 0.0832 0.9472 0.0840
0.50 0.9452 0.1410 0.9450 0.1391 0.9458 0.1408
0.70 0.9504 0.1998 0.9506 0.1961 0.9514 0.1988
258 W. Thangjai et al.

Table 2. The coverage probabilities of 95% two-sided confidence intervals for the
difference of inverse means of normal distributions with unknown CVs.

n m σX σY CIGCI.δ CILS.δ CIM OV ER.δ CID.N W


20 20 0.10 0.01 0.9548 0.9416 0.9534 0.9600
0.03 0.9512 0.9394 0.9524 0.9562
0.05 0.9552 0.9402 0.9558 0.9622
0.07 0.9564 0.9448 0.9586 0.9624
0.09 0.9590 0.9444 0.9592 0.9640
0.10 0.9516 0.9402 0.9520 0.9562
0.30 0.9490 0.9328 0.9476 0.9548
0.50 0.9486 0.9306 0.9488 0.9548
0.70 0.9446 0.9310 0.9436 0.9484
30 30 0.10 0.01 0.9478 0.9382 0.9492 0.9532
0.03 0.9542 0.9476 0.9558 0.9584
0.05 0.9574 0.9472 0.9568 0.9594
0.07 0.9508 0.9424 0.9518 0.9566
0.09 0.9550 0.9470 0.9536 0.9574
0.10 0.9536 0.9454 0.9536 0.9566
0.30 0.9520 0.9414 0.9510 0.9542
0.50 0.9438 0.9384 0.9454 0.9494
0.70 0.9504 0.9380 0.9508 0.9552
20 30 0.10 0.01 0.9510 0.9374 0.9502 0.9556
0.03 0.9510 0.9384 0.9510 0.9544
0.05 0.9498 0.9372 0.9490 0.9534
0.07 0.9548 0.9424 0.9546 0.9578
0.09 0.9498 0.9404 0.9512 0.9560
0.10 0.9558 0.9464 0.9564 0.9602
0.30 0.9588 0.9484 0.9576 0.9616
0.50 0.9530 0.9414 0.9524 0.9558
0.70 0.9484 0.9362 0.9480 0.9540
50 50 0.10 0.01 0.9500 0.9440 0.9516 0.9540
0.03 0.9496 0.9424 0.9482 0.9516
0.05 0.9518 0.9478 0.9522 0.9536
0.07 0.9516 0.9470 0.9526 0.9540
0.09 0.9506 0.9472 0.9498 0.9522
0.10 0.9556 0.9500 0.9554 0.9568
0.30 0.9504 0.9440 0.9506 0.9516
0.50 0.9488 0.9384 0.9474 0.9522
0.70 0.9500 0.9434 0.9464 0.9512
(continued)
CIs for Inverse Mean and Difference of Inverse Means 259

Table 2. (continued)

n m σX σY CIGCI.δ CILS.δ CIM OV ER.δ CID.N W


100 100 0.10 0.01 0.9528 0.9510 0.9538 0.9550
0.03 0.9518 0.9490 0.9516 0.9526
0.05 0.9492 0.9458 0.9494 0.9506
0.07 0.9574 0.9544 0.9562 0.9564
0.09 0.9580 0.9558 0.9588 0.9570
0.10 0.9482 0.9490 0.9502 0.9514
0.30 0.9544 0.9520 0.9556 0.9564
0.50 0.9472 0.9464 0.9492 0.9494
0.70 0.9470 0.9450 0.9484 0.9472
50 100 0.10 0.01 0.9450 0.9402 0.9454 0.9462
0.03 0.9496 0.9446 0.9490 0.9512
0.05 0.9502 0.9452 0.9506 0.9522
0.07 0.9522 0.9480 0.9512 0.9534
0.09 0.9488 0.9426 0.9486 0.9508
0.10 0.9470 0.9428 0.9470 0.9506
0.30 0.9468 0.9412 0.9458 0.9468
0.50 0.9482 0.9438 0.9482 0.9510
0.70 0.9518 0.9502 0.9534 0.9534
200 200 0.10 0.01 0.9478 0.9470 0.9490 0.9480
0.03 0.9440 0.9442 0.9446 0.9454
0.05 0.9480 0.9476 0.9484 0.9510
0.07 0.9532 0.9506 0.9520 0.9536
0.09 0.9472 0.9472 0.9484 0.9486
0.10 0.9490 0.9480 0.9494 0.9502
0.30 0.9522 0.9536 0.9540 0.9542
0.50 0.9452 0.9460 0.9464 0.9458
0.70 0.9464 0.9472 0.9480 0.9472

of Niwitpong and Wongkhao [5] is CIN W = (0.0208, 0.0227) with the interval
length of 0.0019. It is seen that CIGCI.θ performs better than CILS.θ and CIN W
in the sense that the length of CIGCI.θ is shorter than CILS.θ and CIN W .
Example 2. The data is previously considered by Lee and Lin [3] and Thangjai
et al. [13]. The data shows the carboxyhemoglobin levels for nonsmokers and
cigarette smokers. The data are fitted by normal distributions. For nonsmok-
ers, the summary statistics are as follows n = 121, x̄ = 1.3000, s2X = 1.7040,
and 1/x̄ = 0.7692. For cigarette smokers, the summary statistics are as follows
m = 75, ȳ = 4.1000, s2Y = 4.0540, and 1/ȳ = 0.2439. The difference between
1/x̄ = and 1/ȳ = is 0.5253. The 95% two-sided confidence intervals for the
260 W. Thangjai et al.

Table 3. The average lengths of 95% two-sided confidence intervals for the difference
of inverse means of normal distributions with unknown CVs.

n m σX σY CIGCI.δ CILS.δ CIM OV ER.δ CID.N W


20 20 0.10 0.01 0.0937 0.0874 0.0935 0.0960
0.03 0.0971 0.0907 0.0970 0.0995
0.05 0.1039 0.0972 0.1040 0.1064
0.07 0.1140 0.1067 0.1142 0.1168
0.09 0.1253 0.1173 0.1255 0.1283
0.10 0.1317 0.1232 0.1319 0.1349
0.30 0.3074 0.2790 0.3024 0.3105
0.50 0.5500 0.4667 0.5195 0.5354
0.70 1.0989 0.7194 0.8592 0.8704
30 30 0.10 0.01 0.0747 0.0714 0.0745 0.0759
0.03 0.0777 0.0743 0.0776 0.0789
0.05 0.0833 0.0798 0.0833 0.0846
0.07 0.0910 0.0872 0.0911 0.0924
0.09 0.1002 0.0959 0.1002 0.1017
0.10 0.1053 0.1009 0.1054 0.1070
0.30 0.2416 0.2272 0.2392 0.2434
0.50 0.4169 0.3772 0.4036 0.4115
0.70 0.6480 0.5513 0.5994 0.6126
20 30 0.10 0.01 0.0933 0.0871 0.0931 0.0955
0.03 0.0957 0.0895 0.0956 0.0981
0.05 0.1001 0.0939 0.1001 0.1023
0.07 0.1068 0.1005 0.1069 0.1092
0.09 0.1147 0.1082 0.1149 0.1172
0.10 0.1197 0.1130 0.1198 0.1221
0.30 0.2485 0.2332 0.2464 0.2508
0.50 0.4187 0.3789 0.4058 0.4137
0.70 0.6494 0.5524 0.6010 0.6141
50 50 0.10 0.01 0.0569 0.0555 0.0569 0.0575
0.03 0.0592 0.0577 0.0592 0.0597
0.05 0.0634 0.0618 0.0634 0.0640
0.07 0.0692 0.0675 0.0692 0.0699
0.09 0.0765 0.0745 0.0765 0.0771
0.10 0.0802 0.0782 0.0802 0.0809
0.30 0.1822 0.1759 0.1813 0.1831
0.50 0.3042 0.2877 0.2991 0.3025
0.70 0.4450 0.4092 0.4284 0.4336
(continued)
CIs for Inverse Mean and Difference of Inverse Means 261

Table 3. (continued)

n m σX σY CIGCI.δ CILS.δ CIM OV ER.δ CID.N W


100 100 0.10 0.01 0.0398 0.0393 0.0398 0.0400
0.03 0.0414 0.0408 0.0414 0.0416
0.05 0.0443 0.0438 0.0443 0.0445
0.07 0.0484 0.0478 0.0484 0.0486
0.09 0.0533 0.0527 0.0534 0.0535
0.10 0.0560 0.0554 0.0561 0.0563
0.30 0.1264 0.1242 0.1260 0.1267
0.50 0.2072 0.2017 0.2056 0.2067
0.70 0.2949 0.2837 0.2900 0.2917
50 100 0.10 0.01 0.0567 0.0552 0.0567 0.0572
0.03 0.0579 0.0564 0.0579 0.0585
0.05 0.0601 0.0587 0.0601 0.0607
0.07 0.0632 0.0617 0.0632 0.0637
0.09 0.0669 0.0655 0.0669 0.0675
0.10 0.0692 0.0678 0.0693 0.0698
0.30 0.1328 0.1303 0.1325 0.1333
0.50 0.2113 0.2054 0.2096 0.2108
0.70 0.2986 0.2873 0.2938 0.2956
200 200 0.10 0.01 0.0280 0.0278 0.0280 0.0281
0.03 0.0291 0.0289 0.0291 0.0292
0.05 0.0311 0.0310 0.0311 0.0312
0.07 0.0340 0.0338 0.0340 0.0341
0.09 0.0375 0.0373 0.0375 0.0376
0.10 0.0394 0.0392 0.0394 0.0395
0.30 0.0887 0.0879 0.0885 0.0888
0.50 0.1438 0.1418 0.1431 0.1435
0.70 0.2016 0.1978 0.1999 0.2006

difference of inverse means with unknown CVs were computed using the pro-
posed confidence intervals given in Sect. 3. The generalized confidence interval
is CIGCI.δ = (0.4107, 0.7017) with the interval length of 0.2910. The large sam-
ple confidence interval is CILS.δ = (0.3158, 0.7461) with the interval length of
0.4303. The MOVER confidence interval is CIM OV ER.δ = (0.4032, 0.6962) with
the interval length of 0.2930. In comparison, the generalized confidence interval
for the difference of inverse means of Niwitpong and Wongkhao [6] is CID.N W =
(0.3988, 0.7070) with the interval length of 0.3082. It is found that CIGCI.δ and
CIM OV ER.δ provide the lengths shorter than CILS.δ and CID.N W . Therefore,
CIGCI.δ and CIM OV ER.δ perform better than CILS.δ and CID.N W .
262 W. Thangjai et al.

6 Discussion and Conclusions


This paper is extension of previous works of Thangjai et al. [12], Thangjai et al.
[13] and Thangjai et al. [14]. The performance of the new estimators and well-
established estimators were compared. The coverage probability and average
length of the confidence intervals were used for a comparative study. The new
estimator of the inverse mean with unknown CV is θ and the new estimator
of the difference of inverse means with unknown CVs is δ. The well-established
estimators for the inverse mean and the difference of inverse means are 1/μ and
(1/μX ) − (1/μY ), respectively.
For the single inverse mean, the generalized confidence interval (CIGCI.θ )
and the large sample confidence interval (CILS.θ ) for the new estimator were
compared with the generalized confidence interval (CIN W ) for 1/μ of Niwitpong
and Wongkhao [5]. The results indicated that CIGCI.θ and CIN W perform well
in term of the coverage probability. For σ ≤ 0.30, CIGCI.θ provides the average
lengths shorter than CIN W . For σ > 0.30, the average lengths of CIN W are
shorter than the average lengths of CIGCI.θ . It can be concluded that CIGCI.θ
is suggested for the single inverse mean with unknown CV when the value of
population standard deviation is small (σ ≤ 0.30).
For the difference of inverse means, the generalized confidence interval
(CIGCI.δ ), the large sample confidence interval (CILS.δ ) and the MOVER con-
fidence interval (CIM OV ER.δ ) for the new estimator are compared with the
generalized confidence interval (CID.N W ) for (1/μX ) − (1/μY ) of Niwitpong
and Wongkhao [6]. The coverage probabilities of the CIGCI.δ , CIM OV ER.δ
and CID.N W close to nominal confidence level of 0.95. Moreover, CIGCI.δ and
CIM OV ER.δ yield the average lengths shorter than the CID.N W in almost all
cases. Therefore, CIGCI.δ and CIM OV ER.δ perform well in terms of the cover-
age probability and average length. However, the MOVER confidence interval is
easy to use because it has the simple formula. Hence, the MOVER confidence
interval is recommended as an interval estimator for the difference of inverse
means with unknown CVs.

Acknowledgements. This research was funded by King Mongkut’s University of


Technology North Bangkok. Contract no. KMUTNB-61-DRIVE-006.

References
1. Blumenfeld, D.: Operations Research Calculations Handbook. Boca Raton,
New York (2001)
2. Donner, A., Zou, G.Y.: Closed-form confidence intervals for function of the normal
mean and standard deviation. Stat. Methods Med. Res. 21, 347–359 (2010)
3. Lee, J.C., Lin, S.H.: Generalized confidence intervals for the ratio of means of two
normal populations. J. Stat. Plann. Infer. 123, 49–60 (2004)
4. Niwitpong, S.: Confidence intervals for the normal mean with a known coefficient
of variation. Far East J. Math. Sci. 97, 711–727 (2015)
5. Niwitpong, S., Wongkhao, A.: Confidence interval for the inverse of normal mean.
Far East J. Math. Sci. 98, 689–698 (2015)
CIs for Inverse Mean and Difference of Inverse Means 263

6. Niwitpong, S., Wongkhao, A.: Confidence intervals for the difference between
inverse of normal means. Adv. Appl. Stat. 48, 337–347 (2016)
7. Sahai, A.: On an estimator of normal population mean and UMVU estimation of
its relative efficiency. Appl. Math. Comput. 152, 701–708 (2004)
8. Sahai, A., Acharya, R.M.: Iterative estimation of normal population mean using
computational-statistical intelligence. Comput. Sci. Tech. 4, 500–508 (2016)
9. Searls, D.T.: The utilization of a known coefficient of variation in the estimation
procedure. J. Am. Stat. Assoc. 59, 1225–1226 (1964)
10. Srivastava, V.K.: A note on the estimation of mean in normal population. Metrika
27, 99–102 (1980)
11. Sodanin, S., Niwitpong, S., Niwitpong, S.: Generalized confidence intervals for the
normal mean with unknown coefficient of variation. In: AIP Conference Proceed-
ings, vol. 1775, pp. 030043-1–030043-8 (2016)
12. Thangjai, W., Niwitpong, S., Niwitpong, S.: Inferences on the common inverse
mean of normal distribution. In: AIP Conference Proceedings, vol. 1775, pp.
030027-1–030027-8 (2016)
13. Thangjai, W., Niwitpong, S., Niwitpong, S.: Confidence intervals for mean and
difference of means of normal distributions with unknown coefficients of variation.
Mathematics 5, 1–23 (2017)
14. Thangjai, W., Niwitpong, S., Niwitpong, S.: On large sample confidence intervals
for the common inverse mean of several normal populations. Adv. Appl. Stat. 51,
59–84 (2017)
15. Weerahandi, S.: Generalized confidence intervals. J. Am. Stat. Assoc. 88, 899–905
(1993)
Confidence Intervals for the Mean
of Delta-Lognormal Distribution

Patcharee Maneerat(B) , Sa-Aat Niwitpong, and Suparat Niwitpong

Faculty of Applied Science, Department of Applied Statistics,


King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
m.patcharee@uru.ac.th, sa-aat.n@sci.kmutnb.ac.th, suparatn@kmutnb.ac.th

Abstract. Focusing on delta-lognormal distribution, confidence inter-


vals for the mean are proposed in this research. This will be achieved
using a method of variance estimates recovery (MOVER) and gener-
alized confidence interval (GCI) based on weighted beta distribution
by Hannig, and MOVER based on variance stabilized transformation
(VST). These are then compared with GCI based on VST. The coverage
probabilities and average lengths are performances from the presented
methods, computed via Monte Carlo simulation. Our results showed that
MOVER based on VST is the recommended method under situations of
slight probability of being zero and large coefficient of variation in small
to moderate sample sizes. Ultimately, rainfall data in Chiang Mai was
used to illustrate all of the presented methods.

Keywords: Mean · Delta-lognormal distribution


Method of variance estimates recovery
Generalized confidence interval · Rainfall

1 Introduction
Data consist of two characteristics: contained zeros and positive observations
which have a lognormal distribution. This is called delta-lognormal distribution,
first discovered by Aitchison [1]. The following data sets fit the delta-lognormal
in many research areas such as medicine, environment and fishery. For exam-
ple, the diagnostic test charges of patient group were utilized in the Callahan’s
study [3,4,10,15,19,22]. The airborne chlorine concentration was recorded at an
industrial site, United States [10,16,17,19]. The red cod density was inspected
by the National Institute of Water and Atmospheric Research, New Zealand
[8,21]. Also, the monthly rainfall totals in Bloemfontein and Kimberley cities
were surveyed and collected by the South African Weather Service [9].
One of the parameters of interest is mean or expected value of a random
variable in statistical inference. For applications, the mean has been utilized to
apply in several fields. In medicine, it was used to compare the outpatient cost
between before and after a Medicaid policy change, Indiana state, United States
[2], as well as to investigate medical cost of patient groups: patients with type
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 264–274, 2019.
https://doi.org/10.1007/978-3-030-04263-9_20
Confidence Intervals for the Mean of Delta-Lognormal Distribution 265

I diabetes and patients being treated for diabetic ketoacidosis [5]. In environ-
ment, it was used to estimate airborne concentrations in the area of industrial
sites [10,16,19], and to analyze the monthly rainfall totals in Bloemfontein and
Kimberley cities, South African [9]. In pharmacokinetics, it was used to examine
the maximum concentration (Cmax) in men from an alcohol interaction study
[13,18], and to assess the relative carboxyhemoglobin levels for two groups: non-
smokers and cigarette smokers [14].
Furthermore, several researchers have continually studied and developed the
methods to construct confidence intervals for mean in delta-lognormal. For exam-
ple, Zhou and Tu [22] showed that the bootstrap method had the best accuracy
under small sample size, while small skewness was excluded. Tian and Wu [17]
showed that the adjusted signed log-likelihood ratio statistic outperformed in
terms of coverage probabilities and symmetry of upper and lower tail error prob-
abilities. Fletcher [8] recommended the profile-likelihood in situations of error
rates within 1% (lower limit) or 3% (upper limit) of the nominal level, except for
small sample sizes and moderate to high skewness levels. Li et al. [15] suggested
the fiducial approach has highly accurate coverage and fairly low bias. Wu and
Hsieh [21] confirmed that the asymptotic generalized pivotal quantity satisfied
in terms of coverage, expected interval lengths and reasonable relative biases.
Finally, Hasan and Krishnamoorthy [10] proposed the modifying of Tian’s [19]
generalized confidence interval, which was precision and satisfactory in terms of
coverage, and maintained the balanced tail error rates better than the existing
methods.
As mentioned before, delta-lognormal distribution has been applied in real
life. Also, various methods have been proved and developed to find the best
method to construct confidence intervals for delta-lognormal mean. However,
these methods still have the restrictions in a few situations. Therefore, the aim
of this study is importantly to search for a better confidence interval for the mean
using the MOVER and GCI based on weighted beta distribution by Hannig [11],
abbreviated by MOVER-1 and GCI-1, respectively, and MOVER based on VST
(MOVER-2). The three mentioned methods are compared with GCI based on
VST of Wu and Hsieh [21], abbreviated by GCI-2. The outline of this article
is systematized as follows: the methods are elaborated to establish confidence
intervals for the delta-lognormal mean in Sect. 2. Section 3, the numerical results
are detailed to show the performances in terms of coverage probability and aver-
age lengths of all methods. The rainfall amount in Chiang Mai is illustrated with
the proposed methods in Sect. 4. This article closes with a brief discussion and
conclusion.

2 Methods

Let Y = (Y1 , Y2 , ..., Yn ) denote a random sample from delta-lognormal distri-


bution, denoted as Δ(μ, σ 2 , δ) where P (Yi = 0) and n0 ∼ B(n, δ); n0 stands
for the number of zero values. The distribution function of Y was presented by
Aitchison [1], Tian and Wu [17], defined as
266 P. Maneerat et al.

2 δ ; yi = 0
H(yi ; μ, σ , δ) = (1)
δ + (1 − δ)G(yi ; μ, σ 2 ) ; yi > 0

where G(yi ; μ, σ 2 ) stands for the distribution function of lognormal distribution,


denoted as LN (μ, σ 2 ). There are three maximum likelihood estimates as follows:

1  1 
n1 n1
2 n0
μ̂ = ln yi , σ̂ 2 = (ln yi − μ̂) and δ̂ =
n1 j=1 n1 j=1 n

which are the estimates for μ, σ 2 and δ, respectively. The n1 stands for the
number of nonzero values; n = n0 + n1 . The mean, variance and coefficient of
variation (CV) of Yi are
 
σ2
θ = (1 − δ) exp μ + (2)
2
   
λ = (1 − δ) exp 2μ + σ 2 exp σ 2 + δ − 1 (3)

exp (σ ) + δ − 1
2
τ= (4)
1−δ
Then, we obtained that the logarithm of the mean can be written as

σ2
ξ = ln(1 − δ) + μ + (5)
2
There are four presented methods to establish confidence intervals for ξ are as
below.

2.1 The Generalized Confidence Interval


The ideas of generalized confidence interval was first presented by Weerahandi
[20]. This is regarded as a basic method to construct confidence interval for
the interesting parameter based on the concept of generalized pivotal quantity
(GPQ), elaborated as
Let Y = (Y1 , Y2 , ..., Yn ) be a random sample from the probability density
function, denoted as f (yi ; δ, η) where δ and η stand for the interesting and nui-
sance parameters, respectively. Let y = (y1 , y2 , ..., yn ) be a observed value of Y .
The R(Y ; y, δ, η) is considered as the generalized pivotal quantity if it satisfies
the conditions as follows:
(i) Given Y , the probability distribution of R(Y ; y, δ, η) is free of all unknown
parameters.
(ii) The observed value of R(Y ; y, δ, η), denoted as r(y; y, δ, η), depends on the
parameter of interest.
Therefore, we obtain that CIgci = [Rα/2 , R1−α/2 ] becomes the 100(1 − α)%
two-sided confidence interval for the interesting parameter δ based on GCI where
Rα stands for the αth percentile of R(Y ; y, δ, η). Now, the GPQs of μ, σ 2 and δ
Confidence Intervals for the Mean of Delta-Lognormal Distribution 267

are considered to construct confidence intervals for ξ based on GCI. First, the
GPQs of μ and σ 2 were proposed by Krishnamoorthy and Mathew [12] as

W (n1 − 1)σ̂ 2
Rμ = μ̂ − √ (6)
n1 U
(n1 − 1)σ̂ 2
Rσ2 /2 = (7)
2U
2
2
where W = (μ̂−μ)/ (n1n−1)σ̂
1U
and U = (n1 −1)σ̂
σ2 stand for the random variables
of standard normal and chi-square distribution with n1 − 1 degrees of freedom,
respectively. Also, both variables are independent.

2.1.1 The Generalized Confidence Interval-1


The generalized fiducial quantity (GFQ) of δ was presented by Hannig [11]. The
mentioned study found that the combination of two beta distribution, weighted
by 12 , is the best GFQ for δ. Next, the mentioned GFQ was also used with
the research of Li et al. [15]. Additionally, it also follows the concept of GPQ.
Consequently, the GPQ of δ is applied in this study, which is
1
Rδ.H ∼ [Beta(n0 , n1 + 1) + Beta(n0 + 1, n1 )] (8)
2
By Eqs. (6), (7) and (8), the GPQs of μ, σ 2 and δ satisfied the two conditions of
Weerahandi [20], the GPQ of ξ is given by

Rξ.H = ln(1 − Rδ.H ) + Rμ + Rσ2 /2 (9)

As a result, the 100(1 − α)% two-sided confidence interval for ξ based on GCI-1
is
CIgci.H = [Lgci.H , Ugci.H ] = [Rξ.H (α/2), Rξ.H (1 − α/2)] (10)
where Rξ.H (α) stands for the αth percentile of Rξ.H .

2.1.2 The Generalized Confidence Interval-2


Dasgupta [6] showed that the coverage probabilities of VST was essentially better
than the Wald interval. Next, Wu and Hsieh [21] applied the GPQ of δ based on
VST to construct intervals for delta-lognormal mean, which is

T
Rδ.vst = sin arcsin δ̂ − √
2
(11)
2 n
√  √ 
where T = 2 n arcsin δ̂ − arcsin δ ∼ N (0, 1); n → ∞. By three pivots of
(6), (7) and (11), the GPQ for ξ is defined as

Rξ.vst = ln(1 − Rδ.vst ) + Rμ + Rσ2 /2 (12)


268 P. Maneerat et al.

where the random variables T , W , U are clearly independent. Recall that the
GPQs of μ, σ 2 and δ satisfies the conditions for a general pivotal quantity so that
the GPQ of ξ is also satisfied. Therefore, the 100(1 − α)% two-sided confidence
interval for ξ based on GCI-2 is

CIgci.vst = [Lgci.vst , Ugci.vst ] = [Rξ.vst (α/2), Rξ.vst (1 − α/2)] (13)

where Rξ.vst (α) denotes the αth percentile of Rξ.vst .

2.2 The Method of Variance Estimates Recovery


This simple method was proposed by Donner and Zou [7]. Under normal distri-
bution, their study solved the problem using the estimates of variance recovered
form confidence intervals, these individually computed mean and standard devi-
ation. These mentioned concepts are adopted in this study. Here μ, σ 2 and δ are
the parameters of delta-lognormal. Surely, the mean ξ is a function of mentioned
n1 2
parameters. For σ 2 , the unbiased estimate is σ̂ 2 = i=1 (ln Xi − μ̂) /(n1 − 1),
then
(n1 − 1) σ̂ 2
U= ∼ χ2n1 −1 (14)
σ2
where
χ2n1 −1 stands for chi-square distribution with n1 − 1 degrees of freedom.
σ̂ 2 stands for the sample variance of logarithm of positive observations.
At α significant level, the coverage probability of χ2n1 −1 is defined to estimate
σ 2 as  
P χ2α2 ,n1 −1 ≤ χ2n1 −1 ≤ χ21− α2 ,n1 −1 = 1 − α (15)

Hence, the 100(1 − α)% confidence interval for σ 2 is


 
(n1 − 1) σ̂ 2 (n1 − 1) σ̂ 2
CIσ2 = [lσ2 , uσ2 ] = , (16)
χ21− α ,n1 −1 χ2α ,n1 −1
2 2

Similarly, the MLE of μ is an unbiased estimate so that



(n1 − 1)σ̂ 2
W = (μ̂ − μ)/ ∼ N (0, 1) (17)
n1 U

Which is the random variable based on central limit theorem (CLT). To estimate
sample mean, the coverage probability of W is written at α significant level as

P W α2 < W < W1− α2 = 1 − α (18)

As a result, the 100(1 − α)% confidence interval for μ is


⎡   ⎤
(n − 1)σ̂ 2 (n − 1)σ̂ 2
CIμ = [lμ , uμ ] = ⎣μ̂ − W1− α2 ⎦
1 1
, μ̂ + W1− α2 (19)
n1 U n1 U
Confidence Intervals for the Mean of Delta-Lognormal Distribution 269

For δ, there are two presented methods to construct confidence intervals for δ,
comprising of weighted beta distribution by Hannig [11] and VST as below.
The weighted beta distribution by Hannig [11]
This study applied the weighted beta distribution by Hannig [11] to establish
confidence interval for δ where B1 ∼ Beta(n0 , n1 +1) and B2 ∼ Beta(n0 +1, n1 ).
One obtains that
Betaw = (B1 + B2 )/2 (20)
Then, the 100(1 − α)% confidence interval for δ based on the weighted beta
distribution is

CIδH = [lδH , uδH ] = [Betaw (α/2), Betaw (1 − α/2)] (21)

where Betaw (α) stands for the αth percentile of Betaw .


The variance stabilizing transformation
Dasgupta [6] presented the VST to construct confidence interval for δ. Recall
that n0 ∼ B(n, δ) and δ̂ ∼ N (δ, δ(1 − δ)/n). By applying the delta theorem, one
obtains that √ d
n(δ̂ − δ) → N (0, δ(1 − δ)) (22)
 √
A variance-stabilizing transformation is g (δ) = √ 1/2 dδ = arcsin δ so that
δ(1−δ)

g(n0 ) = arcsin nn0 is also the VST, and then
√  √  d
n arcsin δ̂ − arcsin δ → N (0, 1/4) (23)

√  √  d
Similarly, W = 2 n arcsin δ̂ − arcsin δ → N (0, 1). Therefore, the 100(1 −
α)% confidence interval for δ based on VST is given by

CIδV = [lδV , uδV ]


   
1 1
= sin2 arcsin δ̂ − W1− α2 √ , sin2 arcsin δ̂ + W1− α2 √ (24)
2 n 2 n
Now, the method of variance estimates recovery is considered. Let

σ2
ξ = ξ1 + ξ2 + ξ3 = ln(1 − δ) + μ + (25)
2

Using μ̂, σ̂ 2 and δ̂ from the sample, the estimate of ξ is ξˆ = ξˆ1 + ξˆ2 + ξˆ3 =
2
ln(1 − δ̂) + μ̂ + σ̂2 . The 100(1 − α)% two-sided confidence interval for ξ2 + ξ3 is
first focused based on MOVER, that is

CIξ2 +ξ3 = [lξ2 +ξ3 , uξ2 +ξ3 ] (26)

where
lσ 2 2
lξ2 +ξ3 = (ξˆ2 + ξˆ3 ) − (ξˆ2 − lμ )2 + (ξˆ3 − 2 )
270 P. Maneerat et al.

u
uξ2 +ξ3 = (ξˆ2 + ξˆ3 ) + (uμ − ξˆ2 )2 + ( 2σ2 − ξˆ3 )2
Next, the previous step is combined. Then, the 100(1−α)% two-sided confidence
interval for ξ based on the MOVER approach is given by

CIξ = [Lξ , Uξ ] (27)

where
Lξ = ξˆ − [ξˆ1 − ln(1 − uδ )]2 + [(ξˆ2 + ξˆ3 ) − lξ2 +ξ3 ]2

Uξ = ξˆ + [ln(1 − lδ ) − ξˆ1 ]2 + [uξ2 +ξ3 − (ξˆ2 + ξˆ3 )]2
The lδ and uδ depend on the confidence interval for δ as below.

2.2.1 The Method of Variance Estimates Recovery-1


The 100(1 − α)% two-sided confidence interval for ξ based on MOVER-1 is given
by
CIm.H = [Lm.H , Um.H ] (28)
where
Lm.H = ξˆ − [ξˆ1 − ln(1 − uδH )]2 + [(ξˆ2 + ξˆ3 ) − lξ2 +ξ3 ]2

Um.H = ξˆ + [ln(1 − lδH ) − ξˆ1 ]2 + [uξ2 +ξ3 − (ξˆ2 + ξˆ3 )]2

2.2.2 The Method of Variance Estimates Recovery-2


The 100(1 − α)% two-sided confidence interval for ξ based on MOVER-2 is given
by
CIm.vst = [Lm.vst , Um.vst ] (29)
where
Lm.vst = ξˆ − [ξˆ1 − ln(1 − uδv )]2 + [(ξˆ2 + ξˆ3 ) − lξ2 +ξ3 ]2

Um.vst = ξˆ + [ln(1 − lδv ) − ξˆ1 ]2 + [uξ2 +ξ3 − (ξˆ2 + ξˆ3 )]2

3 Simulation Studies
Monte Carlo simulation is used to assess performances of the presented meth-
ods, including coverage probability (CP) and average length (AL). The confi-
dence intervals are established by four methods: GCI-1, GCI-2, MOVER-1 and
MOVER-2. To choose a recommended method, there are two important criteria:
the coverage probabilities are at least or close to the nominal confidence level
(1 − α) and also the narrowest average length.
The simulation settings consist of the mean μ = 0; the sample sizes n =
10, 20, 30, 50, 100; the probabilities of additional zero δ = 0.2, 0.5, 0.8 and the
coefficient of variation τ = 0.2, 0.5, 1, 2. The following cases of n = 10, 20, δ = 0.8,
and τ = 0.2, 0.5, 1, 2 are similarly excluded with Fletcher [8], Wu and Hsieh [21].
Although, the expected number of non-zeros were beneath 10, the situations
Confidence Intervals for the Mean of Delta-Lognormal Distribution 271

Table 1. The coverage probabilities (CP) and average lengths (AL) of 95% two-sided
confidence intervals for ξ of delta-lognormal
272 P. Maneerat et al.

of n = 30, δ = 0.8 and τ = 0.2, 0.5, 1, 2 were not discarded in this simulation
study because a few methods performed well in the mentioned combinations.
The nominal confidence level was set to 0.95. A total of 10,000 random samples
were generated for each sample size and parameter setting. Also, the number of
pivotal quantities was 5000 for GCI-1 and GCI-2 methods.
The numerical results are shown in Table 1. This indicates that the coverage
probabilities of MOVER-1 were mostly lower than the nominal confidence level
except in the following cases: [n = 50, 100, δ = 0.2, τ = 1, 2] and [n = 100,
δ = 0.5, τ = 1, 2]. When the large τ were excluded, the MOVER-2 performed
well in terms of coverage probabilities. If its average lengths were also considered,
the performances of MOVER-2 satisfied the criterion when δ = 0.2, τ = 1, 2 in
sample sizes n = 20, 30, 50. For GCI-1, its coverage probabilities underestimated
the nominal confidence level in all cases. Conversely, GCI-2 provided coverage
probabilities and average lengths to maintain the target, especially in cases of
δ = 0.5, 0.8 and τ = 0.2, 0.5, 1 in sample sizes n = 30, 50, 100.

4 An Empirical Application
To confirm the simulation results, the rainfall data (mm/day) were recorded
though May 2017 by the Chiang Mai weather station, Northern meteorological
center, Thailand. In general, the rainy season is during mid-May to October in
Thailand. In this period, Thai agriculturist cultivate early when dealing with rice
farming and other plants, so rainfall quantities are one of the most important
factors for plant growth. For the survey, the sample sizes were 31 days, including 8
and 23 days for zero and positive-valued rainfall, respectively, detailed in Table 2.

Table 2. The amount of rainfall recorded by the Chiang Mai weather station on May
2017.

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Rainfall 0.0 2.7 0.0 9.9 5.5 0.0 0.8 4.0 6.9 0.0 0.1 0.0 13.0 0.0 0.5 36.2
Day 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Rainfall 54.8 124.8 0.0 0.8 0.1 0.0 8.6 21.2 53.7 9.7 7.4 14.6 2.7 32.7 0.7

For the test of normality, the p-value of Kolmogorov-Smirnov tests was 0.8697
for the logarithm of positive rainfall on May 2017. Moreover, the mentioned data
contained zero observations. This amount of rainfall clearly fits the character-
istics of delta-lognormal distribution. In fact, the estimated rainfall mean was
θ̂ = 24.79 while the approximate logarithm of the mean of rainfall was ξˆ = 3.21
where μ̂ = 1.64, σ̂ 2 = 3.73, δ̂ = 0.26 and τ̂ = 7.43. Next, the 95% two-sided
confidence intervals for θ were computed based on GCI-1, GCI2, MOVER-1 and
MOVER-2, shown in Table 3. This dataset showed that the confidence intervals
for the mean of rainfall are consistent with the numerical result in the previous
section.
Confidence Intervals for the Mean of Delta-Lognormal Distribution 273

Table 3. Summary of result for the amount of rainfall: the 95% two-sided confidence
intervals for θ using four methods.

GCI-1 GCI-2 MOVER-1 MOVER-2


Lower limit 8.88 8.62 10.26 10.11
Upper limit 204.95 223.28 7123.99 7138.77

5 Discussion and Conclusion


This paper presented the following methods: GCI-1, MOVER-1 and MOVER-2,
and compared them with GCI-2 to establish a confidence interval for the mean
of delta-lognormal distribution. Likewise, Monte Carlo simulation numerically
evaluated the performances of all methods which are considered as coverage
probability and average length. The findings can be briefly summarized as fol-
lows: MOVER-2 is regarded as a recommended method when there is a small
probability of having zero and large CV in small and moderate sample sizes.
Furthermore, GCI-2 satisfied the criteria under the situations of large probabil-
ity of obtaining zero and small CV in moderate and large sample sizes. However,
MOVER-2 is not as complicated as GCI-2 if the two methods are compared
by computation. On the contrary, GCI-1 and MOVER-1 are not recommended
methods because they both have low accuracy in terms of coverage probabili-
ties and average lengths. From the above simulation results, the performance of
GCI-2 matched up with the study of Wu and Hsieh [21].

References
1. Aitchison, J.: On the distribution of a positive random variable having a discrete
probability mass at the origin. J. Am. Stat. Assoc. 50, 901–908 (1955)
2. Bebu, I., Mathew, T.: Comparing the means and variances of a bivariate log-normal
distribution. Stat. Med. 27, 2684–2696 (2008)
3. Callahan, C.M., Kesterson, J.G., Tierney, W.M.: Association of symptoms of
depression with diagnostic test charges among older adults. Ann. Internal Med.
126, 426–432 (1997)
4. Chen, Y.-H., Zhou, X.-H.: Generalized confidence intervals for the ratio or differ-
ence of two means for lognormal populations with zeros, UW Biostatistics Working
Paper Series (2006)
5. Chen, Y.-H., Zhou, X.-H.: Interval estimates for the ratio and difference of two
lognormal means. Stat. Med. 25, 4099–4113 (2006)
6. Dasgupta, A.: Asymptotic Theory of Statistics and Probability Springer Texts in
Statistics. Springer, New York (2008)
7. Donner, A., Zou, G.Y.: Closed-form confidence intervals for functions of the normal
mean and standard deviation. Stat. Methods Med. Res. 21, 347–359 (2010)
8. Fletcher, D.: Confidence intervals for the mean of the delta-lognormal distribution.
Environ. Ecol. Stat. 15, 175–189 (2008)
9. Harvey, J., van der Merwe, A.J.: Bayesian confdence intervals for means and vari-
ances of lognormal and bivariate lognormal distributions. J. Stat. Plann. Infer.
142, 1294–1309 (2012)
274 P. Maneerat et al.

10. Hasan, M.S., Krishnamoorthy, K.: Confidence intervals for the mean and a per-
centile based on zero-inflated lognormal data. J. Stat. Comput. Simul. 88, 1499–
1514 (2018)
11. Hannig, J.: On generalized fducial inference. Stat. Sinica 19, 491–544 (2009)
12. Krishnamoorthy, K., Mathew, T.: Inferences on the means of lognormal distri-
butions using generalized p-values and generalized confidence intervals. J. Stat.
Plann. Infer. 115, 103–121 (2003)
13. Krishnamoorthy, K., Oral, E.: Standardized likelihood ratio test for comparing
several log-normal means and confidence interval for the common mean. Stat.
Methods Med. Res. 0, 1–23 (2015)
14. Lee, J.C., Lin, S.-H.: Generalized confidence intervals for the ratio of means of two
normal populations. J. Stat. Plann. Infer. 123, 49–60 (2004)
15. Li, X., Zhou, X., Tian, L.: Interval estimation for the mean of lognormal data with
excess zeros. Stat. Probab. Lett. 83, 2447–2453 (2013)
16. Owen, W.J., DeRouen, T.A.: Estimation of the mean for lognormal data containing
zeroes and left-censored values, with applications to the measurement of worker
exposure to air contaminants. Biometrics 36, 707–719 (1980)
17. Tian, L., Wu, J.: Confidence intervals for the mean of lognormal data with excess
zeros. Biometrical J. Biometrische Zeitschrift 48, 149–156 (2006)
18. Tian, L., Wu, J.: Inferences on the common mean of several log-normal populations:
the generalized variable approach. Biometrical J. 49, 944–951 (2007)
19. Tian, L.: Inferences on the mean of zero-inflated lognormal data: the generalized
variable approach. Stat. Med. 24, 3223–3232 (2005)
20. Weerahandi, S.: Generalized confidence intervals. J. Am. Stat. Assoc., 899–905
(1993)
21. Wu, W.-H., Hsieh, H.-N.: Generalized confidence interval estimation for the mean
of delta-lognormal distribution: an application to New Zealand trawl survey data.
J. Appl. Stat. 41, 1471–1485 (2014)
22. Zhou, X.H., Tu, W.: Confidence intervals for the mean of diagnostic test charge
data containing zeros. Biometrics 2000, 1118–1125 (2000)
The Interaction Between Fiscal Policy,
Macroprudential Policy and Financial
Stability in Vietnam-An Application
of Structural Equation Modeling

Nguyen Ngoc Thach1(B) , Tran Thi Kim Oanh2 , and Huynh Ngoc Chuong3
1
International Economics Faculty, Banking University of Ho Chi Minh City,
36 Ton That Dam Street, District 1, HCMC, Vietnam
thachnn@buh.edu.vn
2
Banking University of Ho Chi Minh City,
36 Ton That Dam Street, District 1, HCMC, Vietnam
kimoanhtdnh@gmail.com
3
University of Economics and Law, Vietnam National University - HCMC,
Quarter 3, Linh Xuan Ward, Thu Duc District, Ho Chi Minh City, Vietnam
chuonghn@tdmu.edu.vn

Abstract. This paper studies the impacts of fiscal policy, macropruden-


tial policy as well as their interaction on financial stability in Vietnam
during the global economic crisis 2008–2009. Using Structural Equation
Modeling (SEM), the study shows that both fiscal policy and macropru-
dential policy have a great impact on the financial stability. In particular,
fiscal policy has a negative impact while macroprudential policy shows
a positive effect on the financial stability in Vietnam. Besides, evidences
that indicate a negative relation between the fiscal policy and macro-
prudential policy in Vietnam are also implied in the study’s outcomes.
With those results, the authors come to conclusion that Vietnam should
execute macroeconomic policies, especially the fiscal policy and macro-
prudential policy, with caution and consideration of their interaction in
order to take the best advantage of their coordination towards financial
stability.

Keywords: Financial stability · Structural equation modeling


Fiscal policy · Macroprudential policy

1 Introduction

Vietnam’s economy has been increasingly being integrated into the global mar-
ket. However, Vietnamese financial system’s development still remains at low
level with critical dependence on the banking sector. Similar to other developing
countries, Vietnam’s rapid growth over the last three decades is mainly fueled
by domestic and foreign investment funds through the financial system. It can
c Springer Nature Switzerland AG 2019
V. Kreinovich and S. Sriboonchitta (Eds.): TES 2019, SCI 808, pp. 275–288, 2019.
https://doi.org/10.1007/978-3-030-04263-9_21
276 N. N. Thach et al.

be implied that the financial stability plays an important role in controlling


macroeconomic situations, which enhances Vietnam’s economic growth.
The US financial crisis in 2007 caused several negative impacts not only on its
economic growth but also on others worldwide, which triggered a global economic
crisis. To restrict this spread, countries developed mechanisms based on the
combination of macroeconomic policies, specifically the tripartile coordination
of monetary, fiscal and macroprudential policies to stabilize the macroeconomic
situation as well as the financial system. In this context, macroprudential policy
which aims to prevent or limit contagion effects of economic shocks has received
increasing attention from central banks and financial institutions. Since being
first introduced by Basel Committee on Banking Supervision and the Bank of
England in the 1970s, macroprudential policy has been studied and applied more
and more widely in several countries due to the recent global crisis.
In Vietnam, the term “macroprudential policy” has recently been men-
tioned in many colloquia. Especially in 2013, Vietnam National Assembly’s Eco-
nomic Commission compiled criteria for the assessment of macroprudential pol-
icy model according to international experience. However, Vietnam has not yet
developed a full set of macroprudential policy tools itself up to now. Only few
tools are applied with the view to minimizing the risks of macroeconomic insta-
bility to the financial system. Additionally, there is a limitation in researches
on this policy in Vietnam. The existing papers mainly focus on three issues: (i)
the coordination between fiscal and monetary policies in Vietnam; (ii) the effec-
tiveness of individual implementation of fiscal and macroprudential policies to
pursue financial stability in Vietnam; (iii) the coordination between monetary
and macroprudential policies in the context of economic instability. Currently,
there is no researches on the coordination between fiscal and macroprudential
policies with the aim of stabilizing Vietnam’s financial system. For those rea-
sons, this paper aims to address two issues: (i) evaluating the individual impact
of fiscal and macroprudential policies on the financial stability; (ii) estimating
the interaction between fiscal and macroprudential policies in financial stability
in Vietnam.

2 Theoretical Background

2.1 Financial Stability

Smaga [18] indicated that 21 of 27 central banks in European Union already


formed their definitions for the term “financial stability”. Although these def-
initions are not the same, most of them have some common aspects as listed
belows:

• Financial system is fit to smoothly fulfill its functions, mainly focuses on


transferring savings to investments; Financial stability is defined in terms of
its ability to be resistant to economic shocks;
• Financial stability has a positive impact on real sector.
An Application of Structural Equation Modeling 277

Unlike other terms such as “monetary stability” or “macro-economic sta-


bility” whose analytical framework has already been developed and accepted
worldwide, there has not been an analytical framework and standard measure-
ment for “financial stability” according to Gadanecz and Jayaram [3]. Basically,
financial stability can be characterized by the complex interactions of different
sectors in a financial system and it is not easily measured by any single indica-
tor. It is obvious that within a national financial system, markets have different
volatility indices. The fact that every change in each index affects other indices
is indeed a major challenge for policy makers. Therefore, there is a need of an
aggregate indicator grouping all indices in different financial markets to draw
key descriptions of the volatility of the financial system.
In a research performed in Macau, Cheang and Choy [2] uses 2 groups of
indicators: the financial vulnerability indicator (FVI) and the regional economic
capacity index (RECI). In their study, the FVI-based financial stability index
contributes 60% to the financial fragility while the RECI takes up 40%. Also,
Morales and Estrada [10] construct a stability index for the financial system in
Columbia through three weighted approaches to selected variables: profitability
- return on assets (ROA), return on equity (ROE), ratio of net loan losses to
total loan portfolio; probability of default; liquidity - ratio of liquid liabilities
to liquid assets, ratio of interbank funds to liquid assets. It is indicated that
all these three weights approaches point out similar financial stability index.
Another researcher, Morris [11] builds measures of financial stability in Jamaica
by using four indicators with equal weightings: financial development index,
financial vulnerability index, financial soundness index and world economic cli-
mate index. On the other hand, Petrovska and Mihajlovska [14] developed an
aggregate financial stability index for Macedonia with five key variables, in which
the insolvency accounted for 0.25, while the proportion of credit risk, profitabil-
ity, liquidity risk and currency risk were respectively 0.25, 0.2, 0.25 and 0.05.
Finally, Albulescu [1] and Morris [11] used a total of 18 indicators classified into
three subindices: financial development index (4 indicators), financial vulnera-
bility index (8 indicators) and insolvency index (3 indicators). It is recognized
that measures of financial stability constructed by Albulescu [1] and Morris [12]
assess the stability of the financial system quite comprehensively. In fact, the
IMF Global Macroprudential Policy Instruments (GMPI) also uses this bench-
mark in its research (Table 1).

2.2 Fiscal Policy and Macroprudential Policy


2.2.1 Fiscal Policy
According to Krugman and Wells [8], fiscal policy can be understood as a gov-
ernment’s interventions in the tax system and expenditure in order to achieve
macroeconomic objectives such as economic growth, full employment or price
stability. Governments usually implement discretionary or automatic fiscal pol-
icy in which discretionary fiscal policy implies the government’s actions to adjust
spending and/or income in accordance with their decisions. Based on the impact
on output, fiscal policy can be categorized into expansionary and contractionary
278 N. N. Thach et al.

Table 1. Aggregate financial stability indices

Subgroup Indicators Research


Financial Market capitalization/GDP Albulescu [1], Morris [11],
development index Svirydzenka [19]
Total credit/GDP Albulescu [1], Morris [11],
Svirydzenka [19]
Interest spread Albulescu [1], Morris [11],
Svirydzenka [19]
Herfindahl - Hirschmann Index Albulescu [1], Morris [11],
(HHI) Svirydzenka [19]
Financial Inflation rate Albulescu [1], Morris [11].
vulnerability index
General budget deficit/surplus Albulescu [1], Morris [11],
(%GDP) Svirydzenka [19]
Current account deficit/surplus Albulescu [1], Morris [11],
(%GDP) Svirydzenka [19]
Change in real effective Albulescu [1], Morris [11].
exchange rate (REER)
Non governmental credit/total Albulescu [1], Morris [11].
credit
Loans/deposits Albulescu [1], Morris [11]
Deposits/M2 Albulescu [1], Morris [11]
Solvency Non-performing loans/total Albulescu [1], Morris [11].
loans
Bank equity/total assets Albulescu [1], Morris [11].
Probability of default Albulescu [1], Morris [11].

one. Expansionary fiscal policy is usually used to boost output growth during
downturn or recession. When no inflation pressure is found, this is a favor-
able condition to perform expansionary policy. On the contrary, the government
implements contractionary policy to slow growth and inflationary pressure when
the economy is overheated with exhausted sources and high inflation. However,
changes in budget deficit or surplus are not only resulted from the impact of
discretionary fiscal policy but also automatic stabilizers. Particularly, these sta-
bilizers are defined as tools which automatically adjust according to economy
cycle.

2.2.2 Macroprudential Policy


While the purpose of fiscal policy (or monetary policy) is to stabilize output
after economic shocks, macroprudential policy helps to anticipate and to pre-
vent shocks before they occur. This policy uses prudential instruments to limit
systemic risks, thus mitigating default risks and willingly acting against the
An Application of Structural Equation Modeling 279

build-up of financial systemic risks which may cause severe consequences for
real sector as in Nier et al. [12].
Macroprudential policy consists of institutional frameworks, tools and work-
force. In addition, to achieve high efficiency, it is necessary to have a strong
mechanism of coordination among stakeholders, not to mention the harmo-
nized interaction among macroeconomic policies, particularly, the coordination
between fiscal and macroprudential policies contributing to achieve financial sta-
bility. In fact, it is macroprudential policy that monitors macroeconomic policies
and ensures their consistent interactions thus leading to financial stability.
According to IMF [6]’s research results, there is no standard macropruden-
tial policy for all countries. The choice of policy instruments depends on the
exchange rate regime, development level as well as vulnerability of a financial
system to shocks. Countries usually apply different tools rather than a single
one, then coordinate them to act against the cyclical nature of economy. There
are many ways to categorize macroprudential instruments, two of which are sub-
ject dimension and risk dimension. (i) According to subject dimension, macro-
prudential tools can be classified into tools affecting borrower’s behaviors and
tools of capital control to mitigate the risk of unstable investment flows. (ii) In
terms of risk dimension, there are credit-related tools limiting loans based on
loan-tovalue ratio (LTV), debt-to-income (DTI), foreign currency, credit ceiling
or credit growth; liquidity-related tools limiting the net open position, maturity
mismatch and regulating reserves; and capital-related tools such as countercycli-
cal/timevarying capital requirements, time-varying/dynamic provisioning and
restrictions on profit distribution.

2.3 Interaction Between Fiscal Policy and Macroprudential Policy


As being analyzed above, financial stability is the outcome of the interaction
among macroeconomic policies. In this study, the authors focus on the interaction
between fiscal policy and macroprudential policy which are the two key powers
of macroeconomic policies.
With a view to achieving macroeconomic objectives, especially financial sta-
bility, it is suggested to have a good coordination between government depart-
ments who are responsible for macroprudential policy and fiscal policy. While
the purpose of fiscal policy is to enhance economic growth by affecting aggregate
demand, macroprudential policy is aimed at stabilizing a financial system. On
one hand, that government executes fiscal policy to fasten economic growth can
adversely affect the efficiency of macroprudential policy which means increasing
costs in order to achieve financial stability. On the other hand, if macroprudential
policy is ineffectively implemented which means achieving financial stability with
high costs or failing to achieve financial stability, it shall have negative impacts
on real sector. In other words, this implies adverse effects on the final objectives
of fiscal policy. In summary, the inefficient coordination between macroprudential
and fiscal policies will lead to the failure in pursuing policy objectives (financial
stability and economic growth) as planned. Specifically, ineffective performance
280 N. N. Thach et al.

of fiscal policy aiming at high GDP growth can have an adverse impact on the
goal of macroprudential policy - financial stability.
In more details, the major impacts of fiscal policy on macroprudential policy
are classified as followings:
First, positive impacts including:
• Fiscal policy is an instrument to control capital inflow and outflow.
• For example, the taxation and charges on holders of foreign financial assets
may reduce those assets’ attractiveness to residents.
• An effective performance of fiscal policy allows to achieve rapid economic
growth with public sources economically used. Fast economic development
contributes to solving plenty of problems, one of which is financial stability.
• An appropriate approach to fiscal policy such as countercyclical policy that
allows for considerable increase of financial stability.
Second, negative impacts including:
Expansionary discretionary fiscal policy is likely to cause long-term big bud-
get deficits and high public debts. It is noted that budget deficits are often
offset by two main sources: (i) public debt and (ii) central bank borrowings. In
detail, increasing public debts often creates “crowding out effect” which leads
to a decrease of private investment and economic growth in long-term periods.
If the government borrows money from central bank to offset budget deficit,
inflation will rise, accompanied by decline in aggregate demand and in eco-
nomic growth as well. In addition, the budget deficit, offset by the attraction of
short-term international capital fluctuations that strongly fluctuates and likely
leads to sudden reverse resulting adverse impacts on exchange rates and foreign
exchange reserves shall cause a deficit of current account balance. It is clear that
the instability of international capital flows has a negative impact on financial
stability.
Negative impacts of public debt are expressed in following aspects. (i) Sup-
posing that public debts are too high, according to “crowding out effect”, the
government will limit potential economic growth as well. As a result, the ability
to prevent financial crisis will supposed to be weakened. (ii) excessive borrowings
of the government leads to high spending on debt service and the need to cut
budgets for socio-economic development, including financial sustainability; (iii)
If the government borrows a lot of foreign debts, the risk of relying of the national
financial system on the international financial system is certainly enormous; (iv)
Public debts are considered an expected tax, sooner or later that debt together
with interest. must be paid. For long-term debts, the payment is transferred to
future generations.
• For countries facing serious public debts, fiscal repression is a common tool
in fiscal policy.
It includes measures such as provision of government debts from controlled
financial institutions (pension funds, central banks); current or hidden limits
on bank lending rates; limitation of the international capital flows; increase
of the reserve requirement ratio; increase of the demand for bank capital and
An Application of Structural Equation Modeling 281

retirement funds [15,16]. All of the above enhance the government control
of financial resources to reduce public debt. In essence, fiscal repression is a
latent tax imposing on the operation of financial intermediaries.
The impacts of macroprudential policy on fiscal policy can be described as:
First, positive impacts including:
• Financial stability is indeed a critically important objective of the public
policy, and macroprudential policy is to use safe tools in order to prevent or
to minimize the system risks and ensure the objective of financial stability
that allows for constructing sustainable economic growth.
Thus, effective implementation of macroprudential policy will ensure stable
economic growth, steady increase of economic entities’ income, contributing
to stabilizing tax revenues. It is a good source of tax revenues that not only
maintain the public administration system but also solve social problems and
create the essential conditions for long-term economic growth and ensure the
government’s reserves in case of emergency such as financial crisis. In other
words, thanks to great budget revenues, fiscal policy has more chances to
carry out government’s regular or emergency tasks. Thanks to this, fiscal
policy becomes more flexible to solve emergency problems.
Second, negative impacts including:
• On the contrary, supposing that macroprudential policy is ineffectively imple-
mented, it will cause an increase in financial instability and system risks,
eventually leading to a negative impact on economic growth. In that case,
a decrease in budget revenue of economic subjects will be affected. Conse-
quently, fiscal policy is less effective.
• Risks deriving from inefficient performance of macroprudential policy result
in severe trade deficits, considerable inflows and outflows of international
capital, asset bubbles along with the risks of interest rates, inflation, liquidity,
national debts and sudden changes in market sentiment.

3 Data and Research Methodology


Based on theoretical and empirical studies on financial stability, fiscal policy,
macroprudential policy and data in the period of 2000–2015 collected from IMF,
ADB and Orbis Bank Focus, the author’s team constructs a research model
about the relation between fiscal policy and macroprudential policy to pursue
financial stability in Vietnam in the context of the global economic crisis 2008–
2009 (Table 2).
In order to analyze the relation between fiscal policy and macroprudential
policy in stabilizing Vietnam’s financial system during economic crisises, the
authors use SEM to evaluate the statistically significant relation between latent
variables and observe variables. With SEM, observe variables can be used to
measure latent variables. Moreover, as SEM provides plenty of evidences to form
simple and easy-to-follow indicators, it is especially advantageous for models
282 N. N. Thach et al.

Table 2. Analysis variables

Group Indicators Expectation Authors


Fiscal policy Tax revenue/GDP - Galati and Moessner [4]
Government Albulescu [1], Galati and
debt/GDP Moessner [4]
Expense/GDP Galati and Moessner [4]
Macroprudential Non-performing + Lee et al. [9]
policy loans/total loans
Credit growth Lee et al. [9]
Total loans/total Lee et al. [9]
deposits
World economic Average world +/- Albulescu [1] and Morris [11]
indicators economic growth
Average world Albulescu [1] and Morris [11]
inflation growth
Economic crisis D = 0 for the period - Thanh and Tuan [13]
before 2008
D = 1 for the period
after 2008
Economic growth Real GDP growth + Morgan and Pontines [10]

which have many variables representing complex indicators. Last but not least,
SEM is also helpful in estimating and separating direct and indirect impacts of
variables in the model.
In SEM approach, Path diagrams that like flowcharts play a fundamental
role in structural modeling. They show variables interconnected with lines that
are used to indicate variable correlation as well as causal flows. With a classic
equation:
Y = aX + e (1)
we can use arrows to represent the relationships between variables (both:
observed and latent variables). Latent variables are placed in an ovals or cir-
cles (including error term), observed variables are place in boxes (Fig. 1).

Fig. 1. Simple SEM.


An Application of Structural Equation Modeling 283

With a relationship, we can write an equation like (1). According to this


background, authors proposed the following SEM (Fig. 2) for our concept model:

Fig. 2. The hypothesis-testing model. Source: Authors’ calculation.

In this research, it is required to have an accurate process of estimation as


macroeconomic variables have complicated relation noticing that estimating and
separating direct and indirect effects among factors are conducted in the context
of limited database. One solution for this is to perform concurrent estimation
for regression equations through estimating and simulating data with probability
and loop to ensure robust result. This is also the strength of analyzing by SEM
method. In thi