Vous êtes sur la page 1sur 7

Online estimation of stochastic volatily for asset returns

Ivette Luna and Rosangela Ballini


Abstract This paper suggests an adaptive fuzzy rule based system applied as a nancial time series model for volatility forecasting. The model is based on Takagi-Sugeno fuzzy systems, and it is built in two phases. In the rst phase, the model uses the Subtractive Clustering algorithm to determine group structures in a reduced data set for initialization purpose. In the second phase, the system is modied dynamically via adding and pruning operators and a recursive learning algorithm, which is based on the Expectation Maximization optimization technique. The online algorithm determines automatically the number of fuzzy rules necessary at each step, whereas one step ahead predictions are estimated and parameters are updated as well. The model is applied for forecasting nancial time series volatility, considering daily values the REAL/USD exchange rate. The model suggested is compared against generalized autoregressive conditional heteroskedaticity models. Experimental results show the adequacy of the adaptative fuzzy approach for volatility forecasting purposes.

I. Introduction INCE the 1990s, Value-at-Risk (VaR) has become a standard comprehensive risk measure that summarizes the overall market risk exposure throughout one single quantitative parameter [1]. Its quantication is of great importance for helping nancial institutions to control and manage risk related to business activities as well as for regulatory committees to set margin requirements [2]. Several approaches were proposed in the literature for estimating VaR [3]. A comparison study considering several approaches is detailed in [4]. One of the most common approach used for this purpose is the generalized autoregressive conditional heteroskedasticity (GARCH) family model [5], a type of non-linear time series model that became a standard tool for estimating the volatility of nancial market data. During the last decades, dierent non-linear strategies based on computational intelligence tools have been widely studied and applied for time series forecasting in dierent areas, and more recently their application have been extended to economic and nancial problems. For example, the works detailed in [2], [4] and [6] suggest the use of dierent types of neural networks for forecasting volatility of return time series. On the other hand, the proposal described in [7] suggests the use of a fuzzy inference system for the estimation of the Brazilian central

banks reaction function. Yet, several works suggest the use of hybrid approaches, combining econometric models and neural networks in order to harness the potencial of each approach and obtain a better result than the individual ones [8], [9]. All these works intend in some way, to deal with non-linearities and uncertainties present in economic and nancial systems. An excellent survey of the existence of nonlinearities in the nancial data is provided in [5]. An alternative to handle this situation, particularly those connected with uncertainty and imprecision about theoretical relationships, is to apply the framework of fuzzy inference systems. Fuzzy inference systems have been successfully applied in elds such as automatic control, data classication, decision analysis, expert systems, and time series forecasting. More recently, a particular type of fuzzy models have been widely studied. They are called adaptive fuzzy systems and emerge as a tentative for decreasing the diculty of choosing parameters for building the time series models, with a low computational cost. These models have the ability of designing automatically the input space partition and the capacity of adaptation to possible changes in the dynamic of the system, as well. The papers presented in [10], [11], [12], [13], [14] and [15] are just some examples of this kind of models that have shown a high exibility in capturing the characteristics of data. Therefore, we propose a method to estimate VaR using an adaptive fuzzy systems (AdaFIS) to enhance predictive power by directly estimating volatility and utilizing it to obtain more accurate estimate for VaR. To verify the model performance, out-of-sample tests are performed, and results are compared with the ones achieved following a traditional and well settled model for estimating returns volatility, named the GARCH models. The case study presented in this work analyses the exchange rate of Brazilian Reals (R$) per U.S. Dollar (REAL/USD) series. Results show the adaptive fuzzy models as an alternative for estimating the VaR. After this introduction, this paper proceeds as follows. The general structure of the AdaFIS is presented in Section ??. Section ?? presents in detail the adaptive learning method proposed. Empirical design and results analysis are shown in Section IV. Finally, some conclusions and further research are presented in Section ??. II. Model structure The online FIS is based on a rst order Takagi-Sugeno (TS) fuzzy system [16]. The general structure of this

Ivette Luna and Rosangela Ballini are with the Department of Economic Theory, Institute of Economics, State Univeristy of Campinas, So Paulo - Brasil (email: {ivette, a ballini}@eco.unicamp.br). This work was supporte by XXXYYY

adaptive fuzzy inference system is illustrated in Fig. 1.


k y1

k its respective membership degrees gi , i.e. M

y(x ) = y =

i=1
k g1

k k g i yi

(4)

R1 xk

III. Learning algorithm


yk

R2

k y2

k g2

The learning process is composed of two stages: the initialization stage and the evolving learning stage. A. Initialization The model is initialized by the application of the well known Subtractive Clustering algorithm (SC) [17]. The main objective at this stage is the denition of an initial FIS structure that will serve as a starting point for the next stage (sequential learning). To do this, a small number of input-output patterns constructed with historical data available is used, and the SC algorithm determines an initial number of rules (M 0 ). Centers are also initialized using the input part of centers given by the SC algorithm, and initial FIS spreads are the same as those used by the SC. Consequents at this stage are initialized as singletons using the output part of the centers determined by the algorithm. Although the adaptive model proposed here can be initialized with a unique rule using the rst input-output pattern presented during the sequential learning (like other evolving models, such us [18], for example), the initialization stage is suggested due to the nature of the second stage, which is mainly a recursive version of the oine EM algorithm detailed in [19], and is very sensitive to the initial state. Therefore, the initialization stage speeds convergence and guarantees that the sequential learning starts with a locally optimal solution. B. Online learning Dynamic adaptation presents advantages that go from automatic structure and parameter selection to adaptation to changes in the reasoning environment. Several papers in the literature, such us the D-FNN model proposed in [10], or the eTS model detailed in [18], are examples that demonstrate the eciency and good performance of online learning algorithms. Hence, sequential learning considers FIS evolution via structural and parameter learning, and it is performed every time a new pattern is presented to the actual FIS, avoiding retraining with all historical data at each time instant. Adaptation at each iteration considers a window size (T) over time. That is, the last T patterns will strongly inuence model parameters and structure. As mentioned before, recursive learning used by the evolving FIS is a recursive version of the oine EM algorithm, which is completely detailed in [19]. Based on the oine EM equations, parameter i , i = 1, . . . M is adjusted according to: N 1 hk (5) i = i N
k=1

. . .
RM Rule base
k yM

k gM k gi

...

xk xk Input space partition

Fig. 1.

A general FIS structure.

where xk = [xk , xk , . . . , xk ] Rp is the input vector 1 2 p at instant k, k Z+ ; y k R is the output model, for the 0 correspondent input xk . The input space represented by xk Rp , is partitioned into M sub-regions, and each of these is represented by a fuzzy rule; k = 0, 1, 2, . . . is the time index. The antecedents of each fuzzy If-Then rule (Ri ) are represented by their respective centers ci Rp and covariance matrices Vi |pp . The consequents are represented by local linear models, with output yi , i = 1, . . . , M dened by:
k yi = k i T

(1)

where k = [1 xk xk . . . xk ]; i = [i0 i1 . . . ip ] is the 1 2 p coecient vector of the local linear model for the i th rule. Each input pattern has a membership degree associated with each region of the input space partition. This is calculated through membership functions gi (xk ) that vary according to centers and covariance matrices related to the fuzzy partition, and are computed by:
k gi (xk ) = gi =

i P [ i | xk ]
M

(2)

q P [ q | x ]
q=1

where i are positive coecients satisfying and P [ i | xk ] is dened according to

M i=1

i = 1

P [ i | xk ] = 1 1 1 exp (xk ci )Vi (xk ci )T (2)p/2 det(Vi )1/2 2

(3)

where det() is the determinant function. The model output y(k) = y k , which represents the predicted value for future time instant k is calculated by means of a k non-linear weighted averaging of local outputs yi and

where hN is the posterior probability of the EM algoi rithm for the N th pattern. Carefully observing Eq. (5), and considering a total of N input-output patterns, this equation can be rewritten as: N i = = = 1 N
N

k+1 k k i = i + Ck+1 k hk (y k yi ) i i

(13)

where Ck+1 = i Ck i k ff orget + hk (k )T Ck k i i (14)

hk = i
k=1

1 ( N

N 1

hk + hN ) i i
k=1

1 [(N 1)N 1 + hN ] i i N 1 N 1 + [hN N 1 ] i i N i

(6)

where N 1 is the estimated value for i , considering just i the rst N 1 patterns. As can be observed, Equation (6) is a recursive estimate of Equation (5). Following this procedure for a generic number of iterations k and a window size T , dened in this work by trial and error, recursive estimates for all the model parameters can be written as follows: k+1 = k + i i ck+1 = ck + i i 1
k+1 i

1 k [h k ] i T i 1 [xk ck ] i

(7) (8)

k+1 i

k+1 k Vi = Vi +

k [(xk ck )(xk ck ) Vi ] i i

(9)

where: 1
k+1 i

=
k+1

hk+1 i
k+1 t=1

ht i

(10)

An approximation of t=1 ht can be constructed coni sidering a window size T and the recursive equation form inspired by the adaptive learning of the fuzzy system k+1 k+1 detailed in [20]. Let Si = t=1 ht , and Si (xk+1 ) = i k+1 k+1 can be estimated as: hi . Then Si
k+1 Si Si (xk+1 ) +

is the covariance matrix associated with each i during the online adaptation. The forgetting factor ff orget 0 (0, 1], in this paper was initially set as ff orget = 0.9. To guarantee stability ff orget was slowly increased so that k after a long time ff orget 1.0. 0 Initial conditions for i , i = 1, . . . , M were given by the values obtained through model initialization, while C0 = I, where = 104 and I is an identity matrix with i dimensions p + 1 p + 1. After the initialization phase, online adaptation was undertaken via structure modication based on adding and pruning operators and parameters update using Equations (7)-(14). Adding: The criterion to judge whether to generate a new fuzzy rule was based on the if-part criterion, which veries if some existing fuzzy rule clusters the input vector. Assuming a normal input data distribution, with a condence level of %, we can construct a condence interval [ci z diag(Vi ), ci + z diag(Vi )], where diag(Vi ) is the main diagonal of the covariance matrix Vi . In this paper, we get a condence level of = 72, 86% which requires a z value of 1.1, obtained from the normal distribution table. It is clear that = 72, 86% is the middle chunk, leaving 13,57% probability excluded in each tail. That is: max P [ i | xk ]
i=1,...,M

> 0.1357

(15)

T 1 k Si T
k Si ] T

(11)

which can be rewritten as:


k+1 k Si Si + [Si (xk+1 )

(12)

k It is interesting to analyze Equation (12). Si /T can be interpreted as an estimate of the mean value of i over the window size T . Therefore, the higher its value, the more relevant its respective fuzzy rule will be for the k+1 next step. If Si (xk+1 ) gets a low value over T , Si will decrease and the probability of pruning the associated i th rule will increase. To estimate i , it is necessary to apply a weighted recursive least square algorithm (RLS), which considers a forgetting factor over time ff orget as is detailed in [21]. Equations of the RLS algorithm adapted to our problem are dened by the following equation:

If this condition is not satised, it means that there is no rule that can cluster this input vector, that is, the input pattern is not covered by any fuzzy partition. Hence, it is necessary to add a new rule to the structure, expanding the input space region detected during the initialization phase or before the actual time instant k. If it happens, a new rule is generated with the next initialization: ck+1 = xk M+1
k+1 M+1 = 1.0; k+1 M+1 = [y k 0 . . . 0]1p+1 k+1 VM+1 = 104 I, where I is a p p identity matrix;

k+1 = 105 . M+1 Even though this value is too small to interfere in the dynamic of the actual structure, all the i , i = 1, . . . , M +1 are re-normalized, so that the sum of all these coecients will always be equal to the unity.

Pruning: As it can be observed in Equation (5), i can be considered a measure of the importance that each fuzzy rule has for the corresponding topology when compared to the other rules. It occurs because i is proportional to the sum of all posterior k estimates of membership functions gi over all the data set. Hence, a threshold for i is dened, so that every rule with i < min at each iteration is pruned and eliminated from the actual model structure. However, after a new rule is created, its corresponding i will have a small value. If the pruning operator were applied immediately, the new rule would thus be eliminated and there would be no time to verify its relevance for the model structure. This problem is resolved by the creation of a new index, called index of permanence . Every time a new rule is created, its respective i will also be created. As this rule is activated over time, this index is increased, that is: ik+1 = ik + 1 (16)

(a) 4 3.5 3 2.5 2 1.5 0 500 1000 (b) 0.1 0.05 0 0.05 0.1 0 1500 2000 2500

500

1000

1500

2000

2500

Fig. 2. (a) Daily US Dollar exchange rate from January 2000 to December 2010; (b) US Dollar exchange rate returns.

B. Value-at-Risk As already mentioned in Section I, despite of the several controversies reported in the literature, VaR is nowadays a standard tool for quantifying market risk, which is just one of the possible types of risk in nancial markets [22], measuring the worst expected loss at a given condence level. Returns are calculated according to: Rt = ln Pt ln Pt1 (17)

Thus, a rule will be a candidate pruning only if its i is very small and ik > T , where > 0 and T is the same window size used during the sequential learning. This condition ensures that no new rule will be pruned immediately after its creation, allowing it to adjust for a minimum period of time and avoiding useless and abrupt oscillations in the model structure. IV. Case study: Value at Risk A. Data Our case study is based on daily values of the commercial exchange rate of Brazilian Reals (R$) per U.S. Dollar (REAL/USD). Real/USD exchange rate time series (REAL/USD) is registered from January 2000 to December 2010. Historical records are observed in Fig. 2-(a) while Fig. 2-(b) shows its returns. Summary statistics and histograms related to REAL/USD returns are presented in Fig. 3, where we can observe the highest kurtosis coecient among the three series described. The Jarque-Bera statistic reveals that the return series are non-normal with a 99% condence interval. The last thousand samples of the historical observations were used for validation purposes, whereas the rst ones were used for model specication and optimization. In the three cases we observe groups of volatility, which is an indication of heteroscedasticity and consequently, that the data is a candidate for GARCH modelling.

where Pt represents the stock price and Rt is the profitability on a ln scale at time instant t. At time t, we are interested in measuring the risk of a nancial position for the next h periods. If V (h) denotes a random variable indicating the asset value change from time t to time t + h, then we can dene its conditional density function (CDF) as Fh (x). Therefore, the VaR for a long position over time horizon h with probability p is dened as P rob[V (h) V aR] = Fh (V aR) = p (18)

In the case of a long position, which is the hypothesis for the development of the case studies, the loss occurs when V (h) < 0. Thus, the VaR dened in (18) will assume a negative value. The interpretation of p is that, over a large number of trading days, the holder will encounter a loss greater than or equal to VaR for p% of the time over the time horizon h. As observed, for a long position, the left tail of the CDF is important. Since in practice, this CDF is unknown, the studies of VaR are focused on the estimation of the CDF, with emphasis in the behavior of its tail. Estimating the CDF of the series is far beyond the scope of this paper. Thereby, although we observed that the series analyzed in this paper are non-normal, we consider that this unknown CDF is that of the normal

Fig. 3.

REAL/USD returns summary statistics.

distribution for sake of simplicity, once this is the most common assumption in the literature. In order to dene the VaR of our assets given a p condence level, we need a one step ahead forecast of the volatility of the returns (h = 1). For doing so, we use the model proposed (AdaFIS) as well as the ARM A(p, q) GARCH(r, s) model for benchmark purposes. C. ARMA-GARCH and AdaFIS approach A general representation of an ARM A(p, q) GARCH(r, s) model is given by:
p q

In the case of the AdaFIS approach, we aim to forecast the one day estimated volatility of the three cases else the absolute value of the return of the assets. After forecasting the volatility, VaR is computed as V aRp = zp t (22)

For doing so, the FIS uses the following representation:


2 2 2 2 t+1 Rt+1 = f (Rt , Rt1 , . . . , Rtm ) + t = 2

(23)

Rt at
2 t

= = =

0 +
i=1

i Rti +
j=1

j atj (19)

t
r s

where the number of lags m considered as input variables of the AdaFIS model was dened analyzing the partial autocorrelation function of the squared returns, considering as maximum the rst four autoregressive lags as possible inputs. D. Performance metrics Based on the research detailed in [4], in order to verify the model performance, two loss functions are evaluated: the violation ratio and the average square magnitude function. The violation ratio is the percentage occurrence of an actual loss greater than the estimated maximum loss in the VaR framework, and it is calculated as VR= 1 N
N

0 +
i=1

i a2 + ti
j=1

2 j tj

(20)

where the errors t follow a normal distribution N (0, 1). Since these equations provide an estimation of future 2 returns rt and conditional variance t , theses ones can be used for a one step ahead VaR forecast such that V aRp = Rt zp t (21)

where zp is the critical value from the normal distribution table at p% condence level. The order (p, q) of the ARMA part was specied so that the residuals over the returns were non autocorrelated (white noise). On the other hand, several congurations of the GARCH part (p, q) were evaluated, ranking each model according to the Schwartz and Akaike criteria, over a grid of [0, 4] for r and s. The model with the lowest indicators is selected as the most representative for that dataset. The most adequate conguration for the GARCH part was the GARCH(1, 1).

Hk
k=1

(24)

where Hk = 1 if Rk < V aRk and Hk = 0 if Rk V aRk , where V aRk is the one step ahead forecasted VaR for day k considering a 5% condence level; N is the number of observations in the out of sample period. The average square magnitude function suggested in [4] considers the amount of possible default measuring the average squared cost of exceptions given by:

(a)

E=

1 V

0.1

Di
i=1

(25)

0.05 0 0.05 0.1 0.15 0 500 1000 (b) 1500 2000

Return series VaR

where V is the number of exceptions of the respective model, Di = (Ri V aRi )2 when Ri < V aRi and Di = 0 when Ri V aRi . E. Analysis results The econometric model was adjusted using the software EViews. The nal ARMA-GARCH model is given by the following equations Rt = 0.141982 Rt1 + at (26)

2500

0.1 0.05 0 0.05 0.1 0

Return series VaR

500

1000

1500

2000

2500

2 t = 3.56E 06+0.154037a2 +0.808768t1 (27) 2 t1

Fig. 4. (a) US Dollar exchange rate - AdaFIS; (f) REAL/USD AR(1)-GARCH(1,1).

The dataset used for adjustment was the same used by the AdaFIS model during the initialization stage. After that, the model was run in online mode. Parameter T , which represents a window size over time, was set up in 28. On the other hand, min = 0.001 and ff orget = 0.925. Based on the autocorrelation and partial autocorrelation functions, the AdaFIS considered the rst four lags of the square returns for composing the input vector. Results considering V R and E criteria are presented in Table I. Evaluating the V R loss function, we notice that the AdaFIS outperforms the GARCH model in this case study. However, this last one gives also an adequate level of violation ratio taking into account the 5% condence level adopted. Table I also shows the E loss function. When dierent models get a similar or identical hit rates, this metric may help us to discriminate between them. In this case, it just helps us to complement results quantied by the V R metric. Given the condence level, we can observe that both metrics are conclusive, once the E value achieved by the AdaFIS were the lowest one. In general terms, we may argue that for long positions, the AdaFIS has not only the best V R but also a small average magnitude for its violations. Fig. 4 presents the returns and VaR estimates for all the case studies using ARMA-GARCH and AdaFIS approach, respectively.
TABLE I Loss functions.

V. Conclusions The accuracy for its estimation is of great importance for control and manage of risk bearing business activities This paper presents an adaptive fuzzy inference system, applied for bond price forecasting. The main contribution of this work is the development of a sequential learning that performs the model parameters in parallel to the model structure denition. An important feature of the model proposed is that it does not require a retraining of the entire model each time new data on the time series historic is increased, since its learning is developed in an online fashion, providing compact structures through a fast learning procedure, which is a great advantages in terms of time process and computational eort. The forecasting exercise against other specic evolving models of the literature is favourable to the proposal of this paper, showing the AdaFIS as a promising technique for time series modeling and forecasting. Further research considers the reduction of model parameters, as well as an adaptive criterion for dynamic input selection and multi-step ahead predictions. We would also like to decrease the number of key parameters to set up, dening an automatic value for k and T , as well as simplifying the denition given i k for gi , increasing model transparency. Another possible extension of the work presented here are the construction of interval forecasts. References

Case study VR (%) E (%)

REAL/USD GARCH AdaFIS 3.70 1.80 0.008 0.006

[1] X. Chen, Neural network based models for value-at-risk analysis with applications in emerging markets, Ph.D. dissertation, Department of Management Sciences, City University of Hong Kong, 2009. [2] H. Lu, X. Yu, J. Zhu, X. Zhao, and N. Cheng, Value-atrisk forecasting with combined neural network model. in ICNC10, 2010, pp. 746750. [3] P. Jorion, Value-At-Risk. McGraw-Hill, 2001.

[4] C. Dunis, J. Laws, and G. Sermpinis, Modelling commodity value at risk with higher order neural networks, Applied Financial Economics, vol. 20, no. 7, pp. 585600, 2010. [5] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, Journal of Econometrics, vol. 31, no. 3, pp. 307327, 1986. [6] S. A. Hamid and Z. Iqbal, Using neural networks for forecasting volatility of S&P 500 index futures prices, Journal of Business Research, vol. 57, no. 10, pp. 1116 1125, 2004, selected Papers from the third Retail Seminar of the SMA. [7] I. Luna, L. Maciel, R. L. F. da Silveira, and R. Ballini, Estimating the brazilian central banks reaction function by fuzzy inference system, in IPMU (2), 2010, pp. 324333. [8] R. Donaldson and M. Kamstra, An articial neural networkgarch model for international stock return volatility, Journal of Empirical Finance, vol. 4, no. 1, pp. 17 46, 1997. [9] J. Dhar, P. Agrawal, V. Singhal, A. Singh, and R. K. Murmu, Comparative study of volatility forecasting between ann and hybrid models for indian market, International Research Journal of Finance and Economics, no. 45, pp. 6879, 2010. [10] M. J. Er and S. Wu, A fast learning algorithm for parsimonious fuzzy neural systems, Fuzzy Sets and Systems, vol. 126, pp. 337351, 2002. [11] P. Angelov and D. Filev, Simpl eTS: A Simplied Method for Learning Evolving Takagi-Sugeno Fuzzy models, in Proceedings of The IEEE International Conference on Fuzzy Systems, 2005, pp. 10681073. [12] G. Leng, T. McGinnity, and G. Prasad, An approach for on-line extraction of fuzzy rules using a self-organising fuzzy neural network, Fuzzy Sets and Systems, vol. 150, no. 2, pp. 211243, 2005. [13] H.-J. Rong, N. Sundararajan, G.-B. Huang, and P. Saratchandran, Sequential Adaptive Fuzzy Inference System (SAFIS) for nonlinear system identication and prediction, Fuzzy Sets and Systems, no. 157, pp. 12601275, 2006.

[14] I. Luna, S. Soares, and R. Ballini, An Adaptive Hybrid Model for Monthly Streamow Forecasting, in Proceedings of The IEEE International Conference on Fuzzy Systems, 2007, pp. 16. [15] R. Ballini, A. R. R. Mendona, and F. Gomide, Evolving c Fuzzy Modeling of Sovereign Bonds, Journal of Financial Decision Making, vol. 5, no. 2, December 2009. [16] T. Takagi and M. Sugeno, Fuzzy Identication of Systems and Its Applications to Modeling and Control, IEEE Transactions on Systems, Man and Cybernetics, no. 1, pp. 116132, January/February 1985. [17] S. Chiu, A cluster estimation method with extension to fuzzy model identication, in Proceedings of The IEEE International Conference on Fuzzy Systems, vol. 2, June 1994, pp. 12401245. [18] P. P. Angelov and D. P. Filev, An Approach to Online Identication of Takagi-Sugeno Fuzzy Models, IEEE Transactions on Systems, Man and Cybernetics-part B, vol. 34, no. 1, pp. 484498, February 2004. [19] R. Jacobs, M. Jordan, S. Nowlan, and G. Hinton, Adaptive Mixture of Local Experts, Neural Computation, vol. 3, no. 1, pp. 7987, 1991. [20] L. Wang, Adaptive Fuzzy Systems and Control. Prentice Hall, 1994. [21] S. Haykin, Kalman Filtering and Neural Networks. John Wiley & Sons, Inc. , 2001. [22] Y. Liu, Value-at-risk model combination using articial neural networks, Emory University Working Paper Series, 2005.

Vous aimerez peut-être aussi