Vous êtes sur la page 1sur 10

A Simple Selection Test Between the Gompertz and Logistic

Growth Models

Pierre Nguimkeu
Georgia State University

Abstract

This paper proposes a simple model selection test between the Gompertz and the Logistic growth
models based on parameter significance testing in a comprehensive linear regression. Simulations studies
are provided to show the accuracy of the method. Two real-data examples are also provided to illustrate
the implementation of the proposed method in practice.

Keywords: Gompertz function, Logistic function, Model selection, t-test;

1 Introduction
Let Yt be a time series taking nonnegative values. The Gompertz trend curve for Yt is given by

Yt = 1 exp(1 e1 t ), (1)

and the Logistic trend curve for Yt is given by

Yt = 2 (1 + 2 e2 t ), (2)

where t represents time and i , i , i , i = 1, 2, are positive parameters. Model (1) and Model (2), together
with their multi-response and multivariate generalizations, are now widely used in applied research work
for modelling and forecasting the behaviour of many diffusion processes like the adoption rate of technology
based products (Chu et al. 2009, Gamboa and Otero 2009), population growth (Nguimkeu and Rekkas 2011,
Meade 1988), and marketing development (Mahajan et al. 1990, Meade 1984). In fact, the Gompertz and
Logistic curves both share the interesting property that their S-shaped feature are suitable to describe
processes that consist of a slow early adoption stage, followed by a phase of rapid adoption which then tails
Corresponding author. Andrew Young School of Policy Studies, Georgia State University; 14 Marietta Street NW, Suite

524, Atlanta, GA 30303, USA; Tel. (1)404.413.0162; Email: nnguimkeu@gsu.edu.

1
off as the adopting population becomes saturated. However, despite these visual and numerical similarities
there are fundamental differences between the two curves and one of the most important is that the Gompertz
function is symmetric whereas the Logistic function is asymmetric. Failing to account for these differences
and choosing an inappropriate growth curve for inference can lead to seriously misleading forecasts ( see Chu
et al. 2009 and Yamakawa 2013 for some empirical illustrations). The need to develop a reliable selection
procedure to discriminate between the two models in practice is therefore salient.

Unfortunately, in spite of the important request to selection between these models in practice, there
rarely exists a framework for statistical test between the two. The selection is usually made in an ad hoc
basis using criteria based on forecasting errors, on the plausibility of the estimated saturation levels, or on
visual evidence obtained from plotting the data in a special way, see for example, Gregg et al. (1964). A
notable exception is the approach of Franses (1994) who proposed a selection based on statistical significance
testing in an auxiliary regression which we briefly discuss in Section 2. Other approaches used are based on
criteria of fitness that require to actually estimate the two models and then compare their fits with historical
data through measures like R2 or RM SE (see Chu et al. 2009, Yakamawa 2013). Such a procedure is
however not attractive as it requires to estimate both models by nonlinear regression methods involving
numerical optimization which is usually computer expensive and time consuming. There is thus a clear need
for selection methods between Gompertz and Logistic models which are easy to understand and inexpensive
to compute. In this context, it seems natural to investigate the use of statistical tests that require simple
estimation and easy computation.

This paper proposes a model selection test based on one linear regression and the significance test of
one parameter. Our approach is therefore similar in spirit to the one proposed by Franses (1994) who also
based their method to a single parameter significance testing. However, whereas the Franses (1994) method
requires to primarily impute the original data in order to get only strictly positive increments of Yt , our
approach is based on the original responses themselves regardless of their values. Thus, there is no loss or
distortion of information that could possibly undermine the result of our test which at the same time is more
straightforward to compute. We examine the empirical size and power performance of the proposed test
through Monte Carlo simulations and also provide real data examples to illustrate its usefulness in practice.
The results show that the proposed test performs reasonably well in finite samples and could be a better
alternative to the Franses test.

In Section 2 we discuss the transformations of the Gompertz and Logistic curves leading to our selection
procedure as well as the difference between our test and the Franses (1994) method. Section 3 provides
numerical studies including Monte Carlo simulations and two real-data examples. Some concluding remarks
are given in Section 4.

2
2 The Selection Procedure
Recall that Yt is our variable of interest and denote by yt = (Yt Yt1 )/Yt1 the relative increase in Yt . Let
the Gompertz response function in (1) be denoted by g(t):

g(t) = 1 exp(1 e1 t ).

Differentiating g(t) and rearranging terms yields

g 0 (t)
= 1 [ln 1 ln g(t)].
g(t)

This suggests setting up a simple linear regression for the Gompertz model given in (1) with the form

H0 : yt = 1 + 1 ln Yt1 + u1t .

Likewise, if we denote by h(t) the Logistic response function in (2),

h(t) = 2 (1 + 2 e2 t ),

a similar manipulation leads to the differential equation

h0 (t)
= 2 [2 h(t)].
h(t)

Hence, a linear regression model of the form

H1 : yt = 2 + 2 Yt1 + u2t

can be set up for the Logistic model given in (2). Testing (1) against (2) is therefore equivalent to testing
the hypothesis H0 against H1 . Models (1) and (2) as well as Hypotheses H0 and H1 are clearly nonnested
in the sense of Cox (1961). Following Davidson and MacKinnon (1981), an artificial comprehensive model
can therefore be formulated as follows:

H01 : yt = + ln Yt1 + Yt1 + ut ,

3
Figure 1: The Gompertz and Logistic Curves
10 0.09

9.5 0.08
Gompertz
9 0.07
Logistic
8.5
0.06
8
Gompertz 0.05

Y
7.5

Y
Y

Logistic
0.04
7
0.03
6.5
0.02
6

5.5 0.01

5 0
0 5 10 15 20 25 30 35 40 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10
t Y

where ut is an error term. It can be seen that when = 0, H01 reduces to H0 . Thus, it might seem that
to test H0 against H1 we could simply estimate this model and test whether = 0.1 However, since the
variable Yt may be nonstationary, estimation of H01 using ordinary least squares might lead to inaccurate
estimates of . The simplest and more accurate way to base a test on H01 is to estimate a differentiated
version of it given by
0
H01 : yt = + ln Yt1 + Yt1 + t ,

where t is the error term which can be assumed to be N ID(0, 2 ). We can estimate Model H01
0
by ordinary
least squares and test the null hypothesis that = 0 using an ordinary t-test for a desired significance level.
This provides an easy and reliable way to test for H0 . Note that the inclusion of the constant term is
not strictly needed for the comprehensive specification of the differentiated model in theory, but is useful in
practice, for example to control for a possible nonzero mean in the error term, and does not create any bias
in the coefficients. Also, assuming normality of the error terms is not a straightforward assumption, but is a
common practice in time series analysis for estimating parameters and the resulting estimates have a number
of desirable properties even if the errors are non-normal. The selection method between a Gompertz and a
logistic curve based on H01
0
uses all the in-sample observations. Hence, no observations are lost because of
out-of-sample forecasting performance evaluation. This is important in practice where only small samples
are usually available. The graphical illustration given in Figure 1 is also instructive. It shows a relationship
between yt and Yt1 that is logarithmic for the Gompertz process and linear for the logistic process. This
may be helpful to guide the data analysis in practice, although a selection based only on visual evidence
could be misleading or imprecise. Other graphical methods based on different types of transformations on
the variable of interest are also available in Harvey (1984), Franses (1994).
1 This idea is similar to the J test that was first suggested by Davidson and MacKinnon (1981) for nonnnested regressions.

4
Franses (1994) showed that the Gompertz growth model given by (1) could be rewritten in the form

log( log Yt ) = a + bt, (3)

and put forward a testing procedure that involves estimating by ordinary least squares the auxiliary regression

log( log Yt ) = a + bt + ct2 + t (4)

and testing the null hypothesis that the estimated coefficient c is statistically different from zero. If this
coefficient turns out to be statistically different from zero then a Logistic specification should be estimated;
otherwise, a specification based on the Gompertz curve should be preferred. One major drawback of the
Franses (1994) procedure, however, is that, in practice, the values of log Yt may be negative, so that it
would not be possible to apply the second logarithmic transformation in the left-hand side of Equations (3)
and (4). Franses (1994) suggested that such observations be replaced by interpolated values or be treated as
missing, a solution that may well distort the original information, undermine the quality of the estimates, or
at least require the researcher to spend an extra time imputing the data. In contrast, the testing procedure
proposed in this paper uses the original data available and is straightforward and readily applicable without
requiring any further data imputation.

3 Numerical Studies
In this section, we provide both a Monte Carlo simulation study to gain a practical understanding of the
performance of our testing procedure as well as an application to real data examples to show how the test
could be used in practice. The focus of the simulation is to examine the size of the test, i.e. the frequency
of type I error, and the power of the test, i.e., the ability of rejecting the wrong model. The results from the
proposed test (denoted Proposed) is also compared with the Franses (1994) method (denoted Franses).

3.1 Monte Carlo Simulation

This section reports the results of a Monte Carlo study conducted to assess the small-sample performance
of the proposed test and also compare it to the Franses (1994) approach. Two data generating processes are
considered:
DGP0 : Yt = 20 exp(1 e1 t ) + u1t , u1t N (0, 0.1)

DGP1 : Yt = 20(1 + 2 e2 t )1 + u2t , u2t N (0, 0.01)

The first part of the experiment involves estimating probabilities of a Type I error under DGP0 at
1 {1, 2, 3} and 1 {0.05, 0.07, 0.10, 0.12, 0.15} at 5% nominal level. The second part involves calculating

5
Table 1: Estimated size function for testing DGP0 against DGP1 , at 5% significance level

n=20 n=30
Parameters Proposed Franses Parameters Proposed Franses

1 = 1 1 = 0.05 0.001 0.000 1 = 1 1 = 0.05 0.000 0.000


0.07 0.018 0.000 0.10 0.000 0.000
0.10 0.012 0.000 0.15 0.003 0.000
0.12 0.028 0.002 0.10 0.034 0.001
0.15 0.031 0.000 0.15 0.072 0.003
1 = 2 1 = 0.05 0.011 0.000 1 = 2 1 = 0.05 0.000 0.000
0.07 0.002 0.002 0.10 0.000 0.001
0.10 0.011 0.000 0.15 0.001 0.001
0.12 0.024 0.001 0.10 0.042 0.006
0.15 0.000 0.010 0.15 0.003 0.000
1 = 3 1 = 0.05 0.000 0.001 1 = 3 1 = 0.05 0.000 0.000
0.07 0.003 0.009 0.10 0.000 0.000
0.10 0.065 0.000 0.15 0.003 0.000
0.12 0.002 0.010 0.10 0.000 0.008
0.15 0.002 0.014 0.15 0.007 0.008
1 = 4 1 = 0.05 0.003 0.005 1 = 4 1 = 0.05 0.000 0.000
0.07 0.001 0.000 0.10 0.001 0.001
0.10 0.003 0.000 0.15 0.000 0.000
0.12 0.004 0.229 0.10 0.005 0.002
0.15 0.003 0.017 0.15 0.000 0.001

the power of the tests by estimating the rejection probabilities of the tests under the DGP1 for 2 {5, 10, 15}
and 2 {0.3, 0.5, 0.6, 0.7, 0.8, 0.9} at the 5% level. We consider sample sizes of n = 20, n = 25 and n = 30
with 1000 replications each.
The empirical size performance of the test are presented in Table 1 for the sample sizes n = 20 and
n = 30, and in Figure 2 for n = 25, at a nominal significance level of 5%. The results indicates that
while the empirical size of both tests can be below the nominal level of 5% the proposed test clearly dom-
inates the Franses test whose rejection probabilities tends to be consistently close to zero. The size of the
tests does not seem to be sensitive with the different sample sizes considered. The results of the power
study are displayed in Table 2 for sample sizes n = 20 and n = 30, as well as in Figure 3 for the sample size
n = 25. The powers of the proposed test are reasonably high in most cases and occasionally hit the limit of 1.

Compared to the Franses test, the proposed test performs remarkably better. In fact, although the
Franses test also exhibits high powers in many cases, there are several cases in which it completely lacks power.
This is not surprising, given the nature of this test which is partially based on a quadratic approximation of
the original responses (see Equation (4) above and the discussion in Franses 1994). The test may therefore
lack power in some instances, perhaps because the quadratic function has neither an inflexion point nor a
saturation level, two key features of the functional forms being tested.

6
Figure 2: Size function for n = 25
=1 =2
0.09 0.07

0.08 P ro pose d 0.06


0.07 Franse s
0.05

P r o b a b i l i ty
No mi na l
P r o b a b i l i ty
0.06
0.04 P rop ose d
0.05
Franse s
0.04 0.03
Nomi nal
0.03 0.02
0.02
0.01
0.01
0
0

0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2


=3 =4
0.06 0.1

0.05 P ro pose d
0.08
P ro pose d P r o b a b i l i ty Franse s
P ro b a b i l i ty

0.04
Franse s Nom i na l
0.06
0.03 Nom i na l
0.04
0.02

0.01 0.02

0
0

0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2


Table 2: Estimated power function for testing DGP0 against DGP1 , at 5% significance level

n=20 n=30
Parameters Proposed Franses Parameters Proposed Franses

2 = 5 2 = 0.3 0.530 0.106 2 = 5 2 = 0.3 0.445 0.003


Student Version of MATLAB
0.5 0.102 0.001 0.5 0.254 0.059
0.6 0.574 0.051 0.6 0.166 0.828
0.7 0.737 0.288 0.7 0.343 0.827
0.8 0.481 0.096 0.8 0.595 0.543
0.9 0.832 0.387 0.9 0.395 0.826
2 = 10 2 = 0.3 0.221 0.971 2 = 10 2 = 0.3 0.074 0.064
0.5 0.839 0.004 0.5 0.281 0.317
0.6 0.391 0.005 0.6 0.331 0.961
0.7 0.418 0.000 0.7 0.607 0.718
0.8 0.358 0.136 0.8 0.312 0.979
0.9 0.674 0.063 0.9 0.671 0.528
2 = 15 2 = 0.3 0.160 0.982 2 = 15 2 = 0.3 0.047 0.094
0.5 0.455 0.144 0.5 0.179 0.032
0.6 0.930 0.012 0.6 0.181 0.340
0.7 0.440 0.005 0.7 0.484 0.626
0.8 0.692 0.015 0.8 0.594 0.664
0.9 1.000 0.052 0.9 0.950 0.412

7
Figure 3: Power function for n = 25
=5 =10
1 1

0.8 0.8 P r o p o s ed
P robabi l i ty
P ro p o s ed

P robabi l i ty
0.6 F r a n s es
Fr a n s es 0.6

0.4 0.4
0.2 0.2
0 0
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

=15 =20
1 1
P r o p o s ed
0.8 0.8
P robabi l i ty

F r a n s es

P robabi l i ty
0.6 0.6

0.4 0.4 P r o p o s ed

0.2 F r a n s es
0.2
0
0
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
=25 =30
1 1

0.8 0.8
P robabi l i ty

P robabi l i ty

0.6 0.6

0.4 P r o p o s ed 0.4

0.2 F r a n s es
0.2
P r o p o s ed
0 0 F r a n s es
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

3.2 Empirical examples

An application of the proposed selection method is illustrated using two examples taken from Franses (1994).
The first example consists of the official figures on tractor ownership in Spain over the period 1951-1976.
The observations are plotted in Figure 4. The right panel of Figure 4 Student
also depicts the
Version of graph of the yt series
MATLAB

with respect to the Yt1 series.

A visual analysis of this graph shows a curvature in the relationship between yt and Yt1 similar to
the stylized relationship depicted by the right panel of Figure 1. Although this empirical graph should be
interpreted with cautious, it seems to suggests that the series are closer to a Logistic process. This conjecture
is further confirmed by the t-statistic for the parameter in model H01
0
which has a value of 2.658, thus
statistically different from zero at a 5% significance level. This result is consistent with several authors,
including Harvey (1984) and Mar-Molinero (1980) who also argued that the tractors data in Spain followed
a Logistic growth curve, as well as Franses (1994) who obtained a selection t-statistic of 3.7404.

The second example uses the annual stock of cars series in the Netherlands from 1965 to 1989. The

8
Figure 4: Plots of Yt and plots of yt with respect to Yt1 for Tractors in Spain
45
40 0.25
35
30 Tractors yt=f(Yt1)
0.2
25

yt
Yt

20 0.15
15
10 0.1
5
0 0.05
1950 1955 1960 1965 1970 1975 0 10 20 30 40
Yt1
t

graph of the yt series with respect to the Yt series is depicted in Figure 5 and seems to visually suggest a
logarithmic relationship that is similar to the stylized one depicted in the right panel of Figure 1 so that a
Gompertz curve may indeed be appropriate. It is however obvious that the graphical visualization is not
very convincing in these examples which is why such an approach should be used with cautious. The value
of the t-statistic for the parameter in H01
0
is 0.123, which is statistically not significant at the 10% level,
therefore confirming that the Gompertz curve is more adequate. This result is also consistent with that of
Franses (1994), who obtained a selection t-stat of 1.031.

Figure 5: Plots of Yt and plots of yt with respect to Yt1 for Cars in Germany
6 0.2 Student Version of MATLAB

5
0.15 yt=f(Yt1)
Cars
4
Yt

0.1
yt

2 0.05

1 0
1965 1970 1975 1980 1985 1990 1 2 3 4 5 6
t Yt1

4 Conclusion
This paper has provided a model selection test between the Gompertz and the Logistic models. The idea
of the test exploits differential equations underlying both processes which can be estimated and tested in
the form of linear regressions. The test is more insightful and more accurate than alternative approaches
currently used in practice. The test is also easier to compute than the Franses (1994) selection test as it
uses readily available data and does not require any further data imputation as the latter does. Simulation
results show that the test has acceptable size although it can be conservative. Simulations also show that
in most cases, the power is very high and often hits the limit of one. To illustrate the practical use of the

9 Student Version of MATLAB


proposed method two real-data examples are provided. The idea of the test developed in this paper can be
extended to model selection between other types of growth curves.

References
[1] Chu, WL. , Feng-Shang Wu, Kai-Sheng Kao, David C.Yen. Diffusion of mobile telephony :An empirical
study inTaiwan. Telecommunications Policy 33 (2009) 506-520.

[2] Cox, D. R. Tests of Separate Families of Hypotheses. Proceedings of the Fourth Berkeley Symposium
on Mathematical Statistics and Probability, Vol. 1. Berkeley: University of California Press, 1961.

[3] Davidson, R., MacKinnon, J. Several Tests for Model Specification in the Presence of Alternative
Hypotheses. Econometrica 49 (3) (1981) 781-793.

[4] Franses, P.H. A method to select between Gompertz and Logistic trend curves. Technological Forecasting
& Social Change 46 (1994) 45-49.

[5] Gamboa, L.F., Otero, J.An estimation of the pattern of diffusion of mobile phones : The case of
Columbia. Telecommunications Policy 33 (2009) 611-620.

[6] Gregg, J. V., Hossel, C.H., and Richardson, J. T., Mathematical Trend Curves: An Aid to Forecasting,
ICI monograph 1, Edinburgh, Oliver and Boyd, 1964.

[7] Harvey, A. C., Time Series Forecasting Based on the Logistic Curve, Journal of the Operational Research
Society, 35, (1984) 64-646.

[8] Mar-Molinero, C., Tractors in Spain: A Logistic Analysis, Journal of the Operational Research Society
31 (1980) 141-152.

[9] Mahajan,V., E. Muller, F. M. Bass. New product diffusion models in marketing: a review and directions
for research. Journal of Marketing 54 (1990) 1-26.

[10] Meade, N. The Use of Growth Curves in Forecasting Market Development-a Review and Appraisal.
Journal of Forecasting 3 ( 1984) 429-451.

[11] A modified logistic model applied to human populations. J. Royal Statistical Society, Series A. 151
(1988) 491-498.

[12] Nguimkeu, P.E., Rekkas, M., Third-order Inference for Autocorrelation in Nonlinear Regression Models,
Journal of Statistical Planning and Inference 141 (2011) 3413-3421.

[13] Yamakawa, P., Gareth H. Rees, Jose Manuel Salas, Nikolai Alva. The diffusion of mobile telephones:
An empirical analysis for Peru. Telecommunications Policy 37 (2013) 594-606.

10

Vous aimerez peut-être aussi