Applied Modeling of Hydrologic Time Series

APPLIED MODELING OF HYDROLOGIC TIME SERIES by J.D. Salas, J. W. Delleur, V. Yevjevich and W. L. Lane WATER RESOURCES PUBLICATIONS,DEDICATION To Jose and Olinda To my family To my family To Marily, Mark and Tim eeu hE=o reba For information and correspondence: WATER RESOURCES PUBLICATIONS P.O. Box 2841 Littleton, Colorado 80161, U.S.A. APPLIED MODELING OF HYDROLOGIC TIME SERIES About the Authors: Jose D. Salas is Associate Professor of Civil Engineering at Colorado State University, Fort Collins, Colorado. Jacques W. Delleur is Professor of Civil Engineering at Purdue Uni- versity, Lafayette, Indiana. Vujica M. Yevjevich is Professor of Civil Engineering at Colorado State University and Research Professor and Director at the International Water Resources Institute, School of Engineering and Applied Science, George Washington, University, Washington, D.C. and William L. Lane is Hydrologist, Division of Planning at the Water and Power Resources Service, Pacific Northwest Regional Office, Boise, Idaho ISBN-0-918334-37-3 U.S, Library of Congress Catalog Card Number 80-53334 Copyright © 1980 by Water Resources Publications. Ali rights reserved. Printed in the United States of America, The text of this publication may not be reproduced, stored in 2 retrieval system, or transmitted, in any form or hy any means, without a written permission from the Publisher. ‘This publication is printed and bound by BookCrafters, I2e., Chelsea, Michigan, U.S.A. iiChapter 1 TABLE OF CONTENTS Page INTRODUCTION 1 1.L STOCHASTIC PROCESSES AND ‘TIME | SERIES. . 1 1.2 ‘TIME SERIES MODELS . . 4 1.3 TIME SERIES MODELING . . 5 1.4 PHYSICAL BASIS OF TIME SERIES | MODELING IN HYDROLOGY. 6 1.5 REPRODUCTION OF HISTORICAL | STATISTICAL CHARACTERISTICS . . 8 1.6 TIME SERIES MODELS IN HYDROLOGY 10 1.7. TIME SERIES MODELING IN HYDROLOGY we 12 1.8 APPLICABILITY OF TIME SERIES™ MODELING IN HYDROLOGY .... . 16 APPENDIX Al.1 DEFINITIONS, TERMS AND NOTATIONS .... 7 APPENDIX A1.2. ELEMENTARY STATISTICAL PRINCIPLES . . 19 APPENDIX Al.3 ELEMENTARY MATRIX DEFINITIONS AND COMPUTATIONS . . 5 REFERENCES... . pees 20 CHARACTERISTICS OF HYDROLOGIC SERIES . . a 31 2.1 TYPE OF HYDROLOGIC SERIES co G 31 2.1.1 TIME SERIES . . fe Bl 2.1.2 LINE SERIES |)... 32 2.1.3 COUNTING SERIES 1... 1... 32 2.2 GENERAL PROPERTIES OF HYDROLOGIC TIME SERIES 33 2.2.1 COMPONENTS OF HYDROLOGIC © SERIES... : 33 2.2.2 BASIC STATISTICAL CHARACTERISTICS OF TIME SERIES 36 2.2.3 COMPLEX CHARACTERISTICS © OF PERIODIC TIME SERIES . 40 2.2.4 DROUGHT RELATED CHARACTERISTICS OF TIME SERIES .. * . 41 2.2.5 STORAGE RELATED eee enlerice OF TIME SERIES... . 43 iiiChapter 2.2.6 NONHOMOGENEITY AND INCONSISTENCY IN HYDROLOGIC SERIES ei : 2.3 CHARACTERISTICS OF ANNUAL TIME SERIES . . 2.4 CHARACTERISTICS OF PERIODIC TIME SERIES 2.5 CHARACTERISTICS OF MULTIVARIATE TIME SERIES . . 2.6 CHARACTERISTICS OF INTERMITTENT TIME SERIES aioe REFERENCES . . 3 STATISTICAL PRINCIPLES AND TECHNIQUES FOR TIME SERIES MODELING : 3.1 BASIC ESTIMATION TECHNIQUES 3.1.1 PROPERTIES OF ESTIMATORS 3.1.2 METHOD OF MOMENTS . . 3.1.3 METHOD OF LEAST SQUARES 3.1.4 METHOD OF MAXIMUM LIKELIHOOD . . ‘i 3.1.5 JOINT ESTIMATION OF PARAMETERS 3.1.6 PARAMETER ESTIMATION BY REGIONALIZATION we 3.2 NORMALIZATION OF TIME SERIES VARIABLES 3.2.1 NORMALIZATION OF ANNUAL” TIME SERIES 3.2.2. NORMALIZATION OF PERIODIC TIME SERIES 3.2.3 REMARKS . 3.8 ESTIMATION OF PERIODIC PARAMETERS BY FOURIER SERIES 3.3.1 JUSTIFICATION OF USING FOURIER SERIES . . . STIMATION OF FOURIER SERIES COEFFICIENTS . 3.3.8 SELECTION OF SIGNIFICANT HARMONICS AND FOURIER COEFFICIENTS . oe 3.4 ESTIMATION OF PARAMETERS OF MULTIVARIATE MODELS... .. . 3.5 TESTS OF GOODNESS OF FIT 3.5.1 TEST OF INDEPENDENCE 3.5.2 TESTS OF NORMALITY . . . 3.6 PRESERVATION OF STATISTICS AND PARSIMONY OF PARAMETERS 3.3.2 iv Page 44 47 53 57 60 61 63 63 63 65 66 67 69 70 70 70 a 73 4 7 8 86 88 89 92 94Chapter .7 GENERATION AND FORECASTING ace 3.7.2 GENERATION OF SYNTHETIC SAMPLES . . . ‘i USE OF MODELS FOR FORECASTING REFERENCES 4 AUTOREGRESSIVE MODELING . . . fA4.1 DESCRIPTION OF AR MODELS 4.1.1 4.1.2 MATHEMATICAL FORMULATION — OF AR MODELS. PROPERTIES OF AR MODELS 4.2 AR MODELING OF ANNUAL TIME SERIES 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.2.7 4.2.8 ANNUAL AR MODELS ° PARAMETER ESTIMATION FOR ANNUAL AR MODELS . . GOODNESS OF FIT FOR ANNUAL AR MODELS. GENERATION AND FORECASTING USING ANNUAL AR MODELS SUMMARIZED AR MODELING PROCEDURE FOR ANNUAL SERIES... + EXAMPLE OF AR MODELING OF ANNUAL SERIES . LIMITATIONS OF ANNUAL AR- MODELING .. . . PRACTICAL APPLICATIONS OF ANNUAL AR MODELS . 4.3 AR MODELING OF PERIODIC TIME SERIES 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 4.3.6 4.3.7 APPENDIX PERIODIC AR MODELS . PARAMETER ESTIMATION FOR PERIODIC AR MODELS . . : GENERATION USING PERIODIC AR MODELS : SUMMARIZED AR MODELING PROCEDURE FOR PERIODIC SERIES . . EXAMPLE OF AR MODELING OF PERIODIC SERIES p LIMITATIONS OF PERIODIC AR MODELING . . . PRACTICAL APPLICATIONS OF PERIODIC AR MODELS A4.1 AUTOCORRELATION FUNCTION OF AR(p) MODELS v Page 97 98 101 101 105, 105 106 108 117 117 118 124 126 147 148 148 150 156 157 163 175 177 179Chapter Page APPENDIX A4.2 PARTIAL AUTOCORRELATION FUNCTION OF F AR(p) MODELS ..... . 179 APPENDIX A4.3 ANNUAL FLOWS OF THE GOTA RIVER, SWEDEN ...... . . 181 REFERENCES ... . ears 182 AUTOREGRESSIVE-MOVING AVERAGE MODELING .. . se 185 5.1 DESCRIPTION OF “ARMA MODELS. | | 185 5.1.1 MATHEMATICAL FORMULATION OF ARMA MODELS . . O00 185 5.1.2 PROPERTIES OF ARMA MODELS. . 187 5.2 ARMA MODELING OF ANNUAL TIME SERIES . . » 196 5.2.1 ANNUAL ARMA MODELS... 196 5.2.2 PARAMETER ESTIMATION FOR ANNUAL ARMA MODELS . 197 5.2.3 GOODNESS-OF-FIT FOR ANNUAL ARMA MODELS . . . 204 5.2.4 GENERATION USING ANNUAL | ARMA MODELS . . . 206 5.2.5 FORECASTING USING ANNUAL ARMA MODELS : ~ 207 5.2.6 SUMMARIZED ARMA MODELING PROCEDURE FOR ANNUAL SERIES . . 2u1 5.2.7 EXAMPLES OF ARMA MODELING FOR GENERATION AND FORECASTING ANNUAL TIME SERIES... 216 5.2.8 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF ARMA MODELING OF ANNUAL SERIES . 239 5.3 ARMA MODELING OF PERIODIC TIME SERIES .. . Perea ae ee nna) 5.3.1 PERIODIC ARMA MODELS... . 241 5.3.2 PARAMETER ESTIMATION FOR PERIODIC ARMA MODELS... . 242 5.3.3 GOODNESS OF FIT FOR PERIODIC ARMA MODELS . . . 244 5.3.4 SUMMARIZED ARMA MODELING PROCEDURE FOR PERIODIC SERIES . . 244 5.3.5 EXAMPLES OF ARMA MODELING OF PERIODIC SERIES see B46 viChapter 5.3.6 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF ARMA MODELING OF PERIODIC SERIES APPENDIX A5.1 COMPUTER PROGRAMS APPENDIX A5.2_ COMPUTER PROGRAM USED IN THE ANNUAL SERIES EXAMPLE, SECM Oecd meer aera ae eo ee APPENDIX A5.3 CALCULATOR PROGRAM APPENDIX AS.4 COMPUTER PROGRAM USED IN MONTHLY SERIES EXAMPLE, SEC 3.5 ee eee REFERENCES .. 2.6... 00200-0005 AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELIN' 6.1 DESCRIPTION OF ARIMA MODELS) 6.1.1 THE DIFFERENCING OPERATION : 6.1.2 THE ARIMA MODEL . 6.2 SIMPLE ARIMA MODELING OF TIME, SERIES . 6.2.1 THE SIMPLE ARIMA MODEL | | 6.2.2 PARAMETER ESTIMATION FOR SIMPLE ARIMA MODELS. : 6.2.3 GOODNESS OF FIT FOR SIMPLE ARIMA MODELS ........ 6.2.4 SUMMARIZED PROCEDURE FOR SIMPLE ARIMA MODELING. 6.2.5 EXAMPLE OF Sat ARIMA MODELING . . . 6.2.6 LIMITATIONS T6 BE CONSIDERED IN APPLICATIONS OF SIMPLE ARIMA MODELS . . . 6.3 MULTIPLICATIVE ARIMA MODELING OF PERIODIC TIME SERIES . . : 6.3.1 THE MULTIPLICATIVE ARIMA | MODEL... . . 6.3.2 PARAMETER ESTIMATION FOR MULTIPLICATIVE ARIMA MODELS ..........-. 6.3.3 GOODNESS OF FIT FOR MULTIPLICATIVE ARIMA MODELS vii Page 256 268 am 274 aI7 279 279 280 280 282 282 285 286 286 287 292 293 294 299 300Chapter Page 6.3.4 SUMMARIZED PROCEDURE FOR MULTIPLICATIVE ARIMA MODELING . 300 6.3.5 EXAMPLES OF MULTIPLICATIVE | ARIMA MODELING . 302 6.3.6 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF MULTIPLICATIVE ARIMA MODELS . . 320 6.3.7 COMPARISON AND LIMITATIONS: OF ARMA AND ARIMA MODELS . 323 APPENDIX A6.1 COMPUTER PROGRAMS .. 332 APPENDIX A6.2 COMPUTER PROGRAM USED IN SIMPLE ARIMA MODELING, EXAMPLE 6. 335 APPENDIX A6.3 | PROGRAM UNESTM 1. 339 APPENDIX A6.4 PROGRAM UNESTM AND DATA INPUT FOR MULTIPLICATIVE ARIMA MODELING, EXAMPLE 6.3.5. . fe B44 REFERENCES eee eee se 845 MULTIVARIATE MODELING OF HYDROLOGIC TIME SERIES... . 347 7.1 DESCRIPTION OF MULTIVARIATE TIME SERIES MODELS . . 348 7.1.1 GENERAL MATHEMATICAL MODELS... 348 7.1.2 PROPERTIES OF MULTIVARIATE) MODELS 351 7.2 MULTIVARIATE MODELING OF ANNUAL SERIES . 352 7.2.1 MULTIVARIATE AUTOREGRESSIVE AR(1) and AR(2) MODELS... 352 7.2.2 APPROXIMATE MULTIVARIATE, AUTOREGRESSIVE MOVING AVERAGE ARMA(p,q) MODEL .. 357 7.2.3 GOODNESS OF FIT FOR MULTIVARIATE ANNUAL MODELS . . so 360 7.2.4 SUMMARIZED MODELING. PROCEDURE FOR MULTIVARIATE ANNUAL SERIES... 2... . 362 7.2.5 EXAMPLE OF MODELING MULTIVARIATE ANNUAL TIME SERIES... . 2. (367 viiiChapter Page 7.2.6 LIMITATIONS OF MULTIVARIATE ANNUAL MODELING .. 0... 375 7.2.7 PRACTICAL APPLICATIONS OF ANNUAL MULTIVARIATE MODELS .. 2... 377 7.3 MULTIVARIATE MODELING OF PERIODIC TIME SERIES ‘i 379 7.3.1 MULTIVARIATE AR MODELS’ | 379 7.3.2 MULTIVARIATE ARMA MODELS . 384 7.3.3 GOODNESS OF FIT FOR MULTIVARIATE PERIODIC MODELS ............ 386 7.3.4 SUMMARIZED MULTIVARIATE MODELING PROCEDURE FOR PERIODIC SERIES ....... 387, 7.3.5 EXAMPLE OF MODELING MULTIVARIATE PERIODIC TIME. SERIES... . 394 7.3.6 LIMITATIONS OF MULTIVARIATE PERIODIC MODELING 407 7.3.7 PRACTICAL APPLICATIONS OF PERIODIC MULTIVARIATE MODELS ... we 412 APPENDIX A7.1 TABLES OF TIME SERIES DATA USED IN THE EXAMPLES OF CHAPTER 7 a See euEIEIEEeeS REFERENCES... ........ see AAT DISAGGREGATION MODELING i 421 8.1 DESCRIPTION OF DISAGGREGATION MODELS... . . i 423 8.1.1 GENERAL DISAGGREGATION MODEL . ; - 424 8.1.2 SINGLE SITE TEMPORAL DISAGGREGATION MODELS ... 426 8.1.3 MULTISITE TEMPORAL DISAGGREGATION MODELS . . 428 8.1.4 SINGLE SITE HIGHER ORDER AR MODELS 428 8.1.5 MULTIVARIATE HIGHER ORDER AR MODELS ........ 429 8.1.6 SPATIAL DISAGGREGATION MODEL . . fae 8.2 PROPERTIES OF DISAGGREGATION MODELS... 432 8.2.1 PRESERVATION OF EXPECTED VALUES BY DISAGGREGATION MODELS . . 432 ixChapter 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.2.2 PRESERVATION OF ADDITIVITY BY DISAGGREGATION MODELS 8.2.3 PRESERVATION OF COVARIANCES AND VARIANCES BY DISAGGREGATION 5 PARAMETER ESTIMATION FOR ISAGGREGATION MODELS ...... 8.3.1 PARAMETER ESTIMATION FOR THE LINEAR DEPENDENCE MODEL... . 8.3.2 PARAMETER ESTIMATION FOR THE BASIC TEMPORAL DISAGGREGATION MODEL... 8.3.3 PARAMETER ESTIMATION FOR THE EXTENDED TEMPORAL DISAGGREGATION MODEL. . . 8.3.4 PARAMETER ESTIMATION FOR THE CONDENSED TEMPORAL DISAGGREGATION MODEL. 8.3.5 PARAMETER ESTIMATES FOR THE SPATIAL DISAGGREGATION MODEL . . oe GOODNESS OF FIT FOR DISAGGREGATION MODELS. SUMMARIZED MODELING PROCEDURE FOR DISAGGREGATION MODELS . GENERATION AND FORECASTING USING DISAGGREGATION MODELS. . . EXAMPLE OF DISAGGREGATION MODELING 30 LIMITATIONS OF DISAGGREGATION MODELING A PRACTICAL APPLICATIONS OF DISAGGREGATION MODELS ..... . APPENDIX A8.1 ESTIMATION OF COVARIANCES Fe eee eee REFERENCES ........ 9 CONSIDERATIONS IN MODEL APPLICATIONS .... 2. 9.1 PRETREATMENT OF HISTORICAL DATA . oe 9.1.1 DATA COMPILATION |. 1... 9.1.2 DATA FILL-IN AND EXTENSION . . 9.1.3 REDUCTION OF DATA TO NATURAL CONDITIONS Page 432 433 433, 433 434 435 435 436 436 438 441 441 446 450 452 459 461 461 461 463 464Chapter 9.2 9.3 9.4 9.5 MODEL SELECTION. . 9.2.1 IMPORTANCE OF HISTORICAL STATISTICS .. . 9.2.2 PRESERVATION OF HISTORICAL STATISTICS... PURPOSE FOR GENERATION | SENSITIVITY OF RESULTS REGIONAL ANALYSIS L APPLICATIONS. . RESERVOIR SIZING STUDIES RESERVOIR OPERATION STUDIES ... . 9.3.3 BASIN-WIDE STUDIES MODEL LIMITATIONS . 9.4.1 SHORT- AND LONG-TERM PERSISTENCE . 4.2 ANNUAL MODEL LIMITATIONS’ 4.3 PERIODIC MODEL LIMITATIONS . 9.4.4 DISAGGREGATION MODEL LIMITATIONS CLOSING REMARKS ape ear INDEX BY SUBJECT INDEX BY AUTHOR xi Page 465 465 466 466 467 467 468 468 469 469 410 an 471 412 472 475 482PREFACE The purpose of this book is to present the application of time series analysis for modeling hydrologic time series for the use of practicing engineers. The book attempts to bridge the gap between complexities of the research literature and the oversimplifications that can be found in some elementary textbooks-and handbooks. The concepts of random variables and random phenomena have been used in the field of hydrology since the beginning of the 20th Century. About the same time, statistics and probability theories were being applied to the analysis of river flow sequences. During the 1950's, such early concepts and applications were extended, introducing the idea of generating streamflow samples by using tables of normal random numbers. However, if was not until the beginning of the 1966’s ‘that the formal development of stochastic modeling began with the introduction and application of autoregressive (Markov) models to seasonal and annua) hydrologic time series, Since then, a great deal of work has been done and published. Research on hydrologic time series has been aimed at studying the main statistical characteristics, providing physical justification to some stochastic models, developing new and/or alternative models, improving the estimates of model parameters, developing new or improving existing modeling procedures, improving tests of goodness of fit, developing procedures on dealing with model and parameter uncertainties and studying the sensitivity of models and model parameters in applied hydrology Although the development of time series modeling in hydrology has reached some degree of sophistication, unfortunately most time series modeling in practice is still generally based on simplified methods. This usually involves the selection of the tme series model (mostly an autoregressive model) in advance, and the estimation of its parameters by the method of moments without verifying constraints and also without testing the goodness of fit of the model or comparing it with the competing models. It appears that one of the main reasons for this state-of-the-art is that although many statistical books are available with modeling techniques and many research papers have been published in several journals (Water Resources Research, Journal of Hydrology, Journal of ASCE Hydraulics Division etc.), they appear to be impractical for most practicing engineers due to mathematical complexities. Another reason simplified methods are used is that the available textbooks, monographs and handbooks in hydrology, which include stochastic modeling, are either too cumbersome to read or oversimplified. Therefore, most practicing engineers either find difficulties in following the research oriented literature which was not written for them, or tend to follow a very simplified and limited modeling approach xiiiThis book covers the steps involved in the application of time series analysis for modeling hydrologic time series. The models considered have been carefully limited to those which are inferred by the authors to be of the most use and promise to practitioners. An ever-present goal has been to simplify the presentation and to avoid saturating the reader with every possible model and technique. It is readily ad- mitted that this approach may mean that some of the very best, but as yet unused, unrecognized or unproven models and techniques have not been presented. Since this book is oriented to practitioners and not to researchers, this limita- tion is unavoidable. The choice of which models have been developed adequately for use by practitioners was a subjec- tive choice on the part of the authors of this book. This book is the outgrowth of lecture notes prepared for the "Computer Workshop in Statistical Hydrology" held at Colorado State University, in July, 1978. Discussions with J. C. Schaake, T. E. Croley, If, D. C. Boes, E. Benzeden, and R. A. Smith aided in the shaping of the contents of this book. A draft of this book was used as a text for the graduate level course "Stochastic Processes in Hydrology" during Spring, 1980 and for the summer course "Statistical Computer Techniques in Hydrology and Water Resources,” July, 1980, both taught at Colorado State University. Part of the draft was also used as a graduate level course "Statis- tical Hydrology” at Purdue University during Spring, 1980 Gratitude is extended to participants of the two summer courses as well as to the graduate students of the class of Spring, 1980, for their comments and suggestions which improved the text. Gratitude is also extended to the Department of Civil Engineering at Colorado State University, the School of Civil Engineering at Purdue University, the School of Engineering and Applied Science at George Washington University, and the Engineering and Research Center, Water and Power Resources Service (previously Bureau of Reclamation), U.S. Department of Interior, for giving us the opportunity to be involved in teaching, research and practice. Without such experience this book would have been difficult to write. Acknowledgment is also due to the National Science Foundation, Office of Water Resources Research and Technology, and Colorado State University Experiment Station for providing us with research support in the use of statistics, probability theory and stochastic processes in hydrology, research experience which gave us insight into the various aspects of modeling hydrologic time series J. D. Salas J. W. Delleur V. Yevjevich W. L. Lane xivChapter 1 INTRODUCTION Most technical literature does little to ease the experienc? a beginner has in applying stochastic techniques The Jargon. mathematical intricacies~and—errers—Common to literature on stochastic hydrology, plus the constant contest~ ing of rival approaches make the chance of success of any initial attempt very small, This book is an attempt to simplify the presentation of stochastic approaches to facilitate the understanding and application by practitioners. A major aim of this book is to cover in an easy-to-understand manner the analysis and modeling of hydrologic time series. Only the most commonly used models are covered. This chapter introduces some of the main concepts, definitions and notations which are used throughout the rest of the book. Chapter 2 deals with the analysis and computa~ tion of the main statistical characteristics of hydrologic time series, and Chapter 3 gives the most relevant techniques needed in the other chapters for modeling hydrologic series. Chapters 4 through 8 cover the step-by-step modeling proce~ dures for univariate and multivariate series of annual and shorter time intervals. The last, Chapter 9 gives some general comments on the state of the available hydrologic series and analysis to be made prior to its stochastic model~ ing, comments on the relative applicability of the models to water resources problems, and other comments extending or relating to presentations in the previous chapters. 1.1 STOCHASTIC PROCESSES AND TIME SERIES Consider a variable denoted by X. If the outcome of this variable can be predicted with certainty, the variable is said to be a deterministic variable. On the other hand, if the outcome of X cannot be predicted with certainty, then X is a random variable. In the latter case, we can also say that the outcome of X can be expressed in probability sense or X is governed by laws of probability. Assume now that the outcome of X can be observed in sequential manner say Xi, X2, ... where the subscript may represent intervals of time, distance, etc. Such sequence is called a series and when the interval is time, it is called a time series. Often when describing ‘the properties and attributes of time series, the discussion also applies to series in general. Often we shall use only the term series meaning time series. If X is a deterministic variable, then the sequence Xj, Xz, ... is a deterministic series. Furthermore, the set of variables {X,} 1associated with its deterministic governing mechanism or law is called a deterministic process. Similarly, if X is a random variable then X1, Xz, ... is a probabilistic series or in general a stochastic series. Moreover, the set of random variables X,, X2, ... associated with its underlying probability distribution is called a probabilistic process or in general a stochastic process. Actually a stochastic process requires the knowledge of the joint probability distribution f(X1,X2,X3, ...) of the random variables X1, X2, Xg, . If the joint distribution can be factored into the product of the marginal distributions as {(X;) + f(X2) + f(X3) ... the process becomes an independent stochastic process and the series is an independent series. Otherwise there is certain type of serial dependence structure among the variables and the process is called a serially dependent stochastic process and correspondingly a serially dependent time series . A sample series X,, Xo, , Xy obtained from a given stochastic process is called a realizMtion of the stochastic process. There is an infinite number of possible realizations if the distribution of the X's is continuous Figure 1.1(a) shows two realizations of a stochastic process Up to now, our definitions of stochastic process and time series were restricted to outcomes that occur at discrete points in time, although they can be defined in continuous time too. Since the hydrologic time series dealt with in this text are expressed at discrete times, such as days, weeks, months and years, we will continue our presentation of processes and series in discrete times. However, note that the graphical representation of discrete series is often made as continuous lines just because it is easier to observe the graphical ap- pearance and the overall configuration of the series. Figure 1.1(b) shows a continuous representation of the same discrete series shown in Fig. 1.1(a). Although all properties of a stochastic process are imbedded in the joint distribution f(Xi, Xz, ...), it is convenient to indicate some specific properties such as expected values, variances and covariances (see Appendix Al.2 for the formal definition of these properties). In general the expected value of a stochastic process Xy, Xz, ... is composed of the set of expected values at each position in time, namely E(Xy), E(X2), ... . Similarly, the set of variances are Var(X1), Var), We will also use the notations 4, = E(X,) and o,? = Var(X,), t = 1,2, , to represent the expected valuds and variances, respectively, Considering any two positions t and t-k, the covariance between the variables X, and X,y is ‘represented by Cov,(k) = Cov(X,, Xp). The cbariance is the property destribing the linear dependence of the stochastic process 2Xt (b) Figure 1.1(a) Two realizations of a stochastic process with values plotted only at discrete points and (b) the same two realizations of the stochastic process plotted as continuous lines. A stochastic process (time series) is stationary in the mean or first-order stationary if the expected values do not vary with time, that is E(X,) = E(Xp) = ... = E(X,) = E(X) =». Similarly, when Var(X,) = o?, t = 1,2, is a constant, the stochastic process is stationary in the variance. A stochastic process is stationary in the covariance when the covariance depends only on the time lag k but it does not depend on the position t. That is, Cov(X,, X,_,,) = Cov(k) regardless of t. A stochastic process is second-order sta~ tionary when it is stationary in the mean and in the covariance. Note that stationarity in the covariance implies stationarity in the variance. A second-order stationary process is also called stationary in the wide sense or weak stationary. In the above definition, instead of using the term "stationary 3stochastic process" we can also use the term "stationary time series" or simply "stationary series." If the other statistical properties besides the mean, variance and covariance do not depend on time, the stochastic process is stationary in the strict sense or Strong stationary. Conversely, if any property depends on time, the process is a non-stationary process However, as the various definitions of stationarity would imply, a process can be stationary in regard to one property, but it can be non-stationary in regard to another property. 1.2 TIME SERIES MODELS A mathematical model representing a stochastic process is called "stochastic model" or “time series model." It has a certain mathematical form or structure and a set of parameters. A simple time series model could be represented by 2 single probability distribution function f(X; 6) with parameters 6 = {61, 62, ...} Valid for all positions t= 1,2, ..., and without any dependence between X,, Xz, ..... For instance, if X is normal with mean yp and variance 0%, the time series model can be conveniently written as X,=utoe t= 1,2, ay ae where ¢, is also normal with mean zero and variance one and €,, €2, ... are independent. In Eq. (1.1) the model has the parameters wand o and since they are constants (do not vary with time) the model is stationary. The structure of the model is simple since the variable X, is a function only of the independent variable ¢, and so X, is also independent. A time series model with dependence structure can be formed as Foe yt & (1.2) where £ is an independent series with mean zero and variance (1 - 97), & is the dependent series, and is the parameter of the médel. In Eq. (1.2) ¢, is a dependent series because in addition to-being_a function of &, it is a function of the same variable ¢ at time t-l. If ¢; in Eq. (1.1) would be represented by the dependent model of Eq (1.2) then X, would also become a dependent model. In this case the parameters of the model of X, would be y, o and 9. Since the parameters of the above models are constants, the models are stationary representing stationary time series 4or stationary stochastic processes. Non-stationary models would result if such parameters would vary with time. 1.3 TIME SERIES MODELING Assume we have a sample time series Xi, Xo, ..., Xy of size N, such as N_ years of annual streamflows. We would like to find a mathematical model that represents such time series. The techniques and procedures for finding such a model is called "time series modeling.” Time series modeling is a process which can be simple or complex, depending on the characteristics of the available sample series, on the type of model to use and on the selected techniques of modeling. For instance, series with statistical characteristics that do not vary with time usually lead to models and modeling techniques which are simpler than those of series with time-varying characteristics. There are several types of stochastic models which can be used to represent a time series, Some are more complex than others. For a particular type of model, there are various techniques for estimating the parameters of the model and for testing how good the model is. Also, in this case some techniques are more complex than others. Much of the simplicity or complexity of the modeling process ultimately depends on the modeler, such as the modeler's theoretical knowledge and practical experience. In general, time series modeling can be organized in the following stages (Box and Jenkins, 1970): 1) the selection of the type of model, 2) the identification of the form of the model, 3) the estimation of the mode] parameters, and 4) the diagnostic check of the model The first stage refers to selecting a type of model among the various types of models available to the analyst. For instance, two common types of models for representing the dependence of time series are the Markov chain model and the autoregressive model. For a particular case, the modeler may have to choose between these two types of models. Once a type of model is selected, the next stage is to identify the form or the order of the model. For instance, if autoregres~ sive models were selected in the first stage, then we need to identify the order of the autoregressive model, say order one (one autoregressive coefficient or one parameter), order two, ete, The third stage is to estimate the parameters of the 5model identified in stage two and some checks are made on the conditions to be met by the estimated parameters. The final stage of the modeling is to make some diagnostic checks to verify how good the model is. The overall time series modeling is actually an iterative process with feedback and interaction between each of the above-referred stages 1.4 PHYSICAL BASIS OF TIME SERIES MODELING IN HYDROLOGY Differences always exist between the true and estimated models and between the true and estimated model parameters. These differences represent modeling uncertainties. One way of decreasing such uncertainties is by selecting the model which best represents the physical reality of the system Sometimes it may be feasible to use physical laws to infer what should be the mathematical expressions of the corresponding stochastic models of hydrologic series. This inference is very much contingent on how well the application of physical laws fit the natural hydrologic phenomena, and how various errors and complexities further affect differences between the true and the inferred mathematical models The modeling of streamflow time processes has essentially followed two approaches: the deterministic or physical simulation of the hydrologic system, and the statistical or stochastic simulation of the system. In the first approach, the hydrologic system is described and represented by theoretical and/or empirical physical relationships. There is always a unique correspondence between the input, say precipitation, and the output, say streamflow. _Within this approach, two representative models are the known Stanford Watershed Model (Crawford and Linsley, 1966) and the MIT model (Harley et al., 1970). On the other hand, in the stochastic approach, a type of model is assumed aimed to represent the most relevant statistical characteristics of the historic series. Within this approach, the most widely used models have been the autoregressive models (Thomas and Fiering, 1962; Yevjevich, 1963) Subsequently, other deterministic and stochastic models appeared in the literature. Several arguments have been given by the advocates of deterministic and stochastic approaches to streamflow modeling and simulation; arguments usually in favor of their own and/or against the other approach. In spite of that, Yevjevich (1963), Thomas (1965) and Fiering (1967), tried to set the physical basis of stochastic modeling, at least for the case of the autoregressive models. In the 1970's, a tendency was observed for linking and reconciliating both the deter~ ministic and stochastic approaches (Quimpo, 1971). On the one hand, the deterministic approach treats the precipitation as a random variable and transfers such randomness to 6streamflow while keeping its deterministic framework; on the other hand, physical justification of stochastic models is be- coming relevant not only for operational purposes, but also for explaining certain controversial aspects in stochastic hydrology, such as the Hurst phenomenon. Following Quimpo's lead, other papers appeared in the literature such as those by Moss and Bryson (1974), concerning the physical basis of seasonal stochastic models, Klemes (1973), concerning the modeling of watershed runoff based on concepts of semi- jafinite storage reservoirs, O'Connor (1976) extending Quimpo's work and relating the unit hydrograph and flood routing models to autoregressive and moving average models, and by Pegram (1977), Selvalingan (1977) and Dawdy et al (1978) in providing the physical justification of continuous stochastic streamflow models As an example of joint physical and statistical analysis in inferring on the model, it can be demonstrated that if the river flow recession is a simple exponential function Q, = Q, exp(-Kt) where @, is the flow at the beginning of a year and K is the recession constant, then the time dependent annual runoff series Y, should follow the first order autoregressive model AR(1), namely Y; - Y = o(¥j_) Yt cn where Y is the mean of Y;, $ is the autoregression coefficient and e; is the independent stochastic component (white noise), (Yevjevich, 1963). It is not difficult to find that the recession constant K and the autoregressive parameter are related as 6 = exp (-K). However, how close the AR(1) model is to the true model of the annual runoff series, depends on how good the assumption of exponential river flow recessions are. As an example of the physical justification of autoregressive and moving average (ARMA) models for annual streamflow simulation, let us consider a watershed system as in Fig. 1.2, where the variables are of annual values. Then the annual streamflow 2, is composed of groundwater contri- bution equal to cS,; and surface runoff equal to dx, (Fiering, 1967). That is 2 = Sy td x, (1.3) The continuity equation for the groundwater storage S, gives Sp= Sp, tax ce SyOsa,b, es! Precipitation Osarbsi Evaporation | *t Bx: ’ Ground Surface: x Surface Runott -a-b)Xy 24K, liven Streamflow Ground Water — Level Ground Water Figure 1.2. Conceptual representation of the precipitation- streamflow process (Salas and Smith, 1980a). or 8, = Cre) Stax. (1.4) Combining Eqs. (1.3) and (1.4) Salas and Smith, (1980a) showed that the model for the annual streamflow 2, can be written as (1-e) 4) + d x, - [d(l-c)-ac] x_) (1.5) t t which has the form of an ARMA(1,1) model when the annual precipitation is an independent series. They also extended the above formulation for the general case of ARMA(p,q) models. 1.5 REPRODUCTION OF HISTORICAL STATISTICAL CHARACTERISTICS Models are built to "reproduce" or to "resemble" the main statistical characteristics of the historical hydrologic time series. Such reproduction or resemblance is understood to be in the statistical sense. It does not mean that a generated series based on the model has to give exactly the same statistical characteristics as shown by the historical record. This brings up the questions of what statistical characteristics are to be reproduced by the model and how these characteristics should be interpreted or understood Unfortunately, there are no unique and easy answers to the above questions. First of all, the true or population statistical characteristics of hydrologic series is never known, because what is observed or measured is only a finite 8(sample) number of years (N), and as a result the characteristics derived from such samples are only estimates of the true (unknown) characteristics. Those estimates from a sample of N years are uncertain because if instead of N years of observation, a different number of years N', either smaller or larger than N were observed, then those estimates based on a sample of N' years would be different from those based on N years. The values observed in the historical series of any given number of years is only one realization of the infinite number of possible realizations that may have occurred during that time. Consequently, the statistical character~ istics derived (estimated) from that sample are only one possible estimate out of many others. That is, the sample estimates are random variables and so they are uncertain. Whenever possible and necessary, such uncertainty must be incorporated in the modeling of hydrologic time series. Apart from the problem of uncertain statistical characteristics for a given time series sample, there is the problem of definition and interpretation of the statistical characteristics derived from the sample. The main character istics are the mean, standard deviation, skewness and autocorrelation. Usually the mean and standard deviation are the less uncertain characteristics. Therefore, no one questions their reproduction by a given model. The skewness is highly uncertain, so whether or not a model is able to reproduce precisely the estimated skewness depends on how long the sample was, and how important the skewness is for the model application. For instance, if a reservoir is designed for almost full river development, then the skewness is not very important. However, if it is designed for a low level of development, then the skewness is important (Klemes, 1972). The autocorrelation is also very uncertain, especially for small sample sizes. Its interpretation often decides on the type of model to be used as well as on its form. Some other statistical characteristics of hydrologic series often are important to look at, even though they may depend on the main characteristics already mentioned. The range (related to storage capacities of reservoirs) and the run (property related to droughts) are the additional characteristics, important in water resources studies, that may be derived from a series. However, their interpreation and their ultimate use in hydrologic series modeling has led to contro- versies among hydrologists. The range of cumulative depar- tures from the sample mean is related to the minimum storage capacity required for a reservoir to deliver the sample mean throughout a time period equal to the length of the sample (NX). A related statistic, the rescaled range (the range divided by the sample standard deviation) is proportional to N¥ as N>© for models most typically used in hydrology such as AR and the ARMA models. Analysis made by Hurst 9(1951), using long records of geophysical time series, appears to show that their rescaled range is proportional to N® with h > (While for models such as those referred to above h = 4). “This apparent discrepancy has been called the "Hurst phenomenon." Extensive arguments on issues such as the interpretation of the Hurst phenomenon per se, the uncertain~ ty of the estimates ef the exponent h, the models to be used to reproduce the Hurst phenomenon ‘and its impact on the design and operation of the typical 50-100 year planning horizons, have been raised among hydrologists during the past 30 years, However, studies such as those carried out by Yevjevich (1965), Fiering (1967), O'Connell (1971), Klemes (1974), Delleur et al. (1976), Hipel and McLeod (1978) and Salas et al. (1979), leed us ‘to conclude that simple models such as the AR and ARMA models used in this book are, for most cases of hydrologic series, capable of reproducing the necessary range statistics related to water resources planning problems Just as the consideration of range statistics have led to controversy in stochastic hydrology, the interpretation and reproduction of drought related statistics have also been controversial. It has been a common procedure to use the critical drought (longest run length or largest run sum) of a historic record for making decisions related to designing and operating reservoirs. However, historic droughts, just like any other statistical characteristic are random variables. For instance the average drought duration or the largest (critical or most severe) drought derived from a historic sample of size N are random variables. Therefore, when stochastic models and corresponding simulations are to be used for design and operation of reservoirs, the problem is how to incorporate such historic droughts into the model, or rather what the drought characteristics are and how they should be reproduced by the model. One criteria may be to reproduce the average drought duration and another may be to reproduce the critical drought. However, such reproduction should be made in a statistical sense, that is, the model should reproduce, say, critical droughts with a given probability of occurrence during a period equal to the historic sample size N. For instance, if such probability is 50% it would mean that the return period of the critical drought is N years. Actually, there is not a unique answer to the questions raised above. Ultimately, it depends on judgment and the analysis of each particular case 1.6 TIME SERIES MODELS IN HYDROLOGY Early studies by Hazen (1914) and Sudler (1927) showed the feasibility of using statistics and probability theory in analyzing river flow sequences. Hurst (1951) in investigating 10the Nile River for the Aswan Dam project, reported studies of long records of river flows and other geophysical time series, which years later tremendously impacted the theoretical and practical aspects of time series analysis of hydrologic and geophysical phenomena. Barnes (1954) extended the early empirical studies of Hazen and Sudler and introduced the idea of synthetic generation of streamflow by using a table of normal random numbers. However, it was not until the beginning of the 1960's that the formal development of stochastic modeling started with the introduction and application of autoregressive models for annual and seasonal streamflows (Thomas and Fiering, 1962; Yevjevich, 1963). Since then, several groups around the world engaged in extensive re~ search efforts toward improving those early concepts and models, providing physical justification of some models, introducing alternative models and studying their impacts in water resources systems planning, design’ and operation. Literature related to these various aspects is extensive and has been reviewed by several hydrologists such as Chiu (1972), Rodriguez et al. (1972), Klemes (1974), Jackson (1975a), Lawrance and Kottegoda (1977), and McLeod and Hipel (1978): Several stochastic models have been proposed in the past for modeling hydrologic time series. They are: autoregressive models (AR) (Thomas and Fiering, 1962; Yevjevich, 1963; Matalas, 1967), fractional Gaussian noise models (FGN) (Mandelbrot and Wallis, 1968; Matalas and Wallis, 1971), autoregressive moving-average models (ARMA) (Carlson, et al., 1970; O'Connell, 1971); broken-line models (BL) (Mejia, 1971), shot-noise models (Weiss, 1973), model of intermittent processes (Yakowitz, 1973; Kelman, 1977), disaggregation models (Valencia and Schaake, 1973), Markov mixture models (Jackson, 1975b), ARMA-Markov models (Lettenmaier and Burges, 1977), and general mixture models (Boes and Salas, 1978). | Supposedly, all of these models have been developed and proposed with the objective to reproduce the main statistical characteristics which are observed or identified in hydrologic time series, but which have a bearing on the design and/or operation of the water system under study. Although each model has its own merit and some of them can be suc- cessfully applied in operational hydrology, they do have limitations. They all have been criticized for one or more of the following reasons: (i) not being able to reproduce short- term dependence, (ii) not being able to reproduce long-term dependence, (iii) difficulty in estimating parameters, (iv) limitations for generating large samples of synthetic data, (v) lack of physical basis, and (vi) too many parameters. It would take an extensive discussion beyond the intents of this book to make an accounting of the advantages and limitations of each of the above-mentioned models. The interested reader may wish to go back to the original publications ias well as to the reviews referred to above. Specific comments on the limitations of AR, ARMA, ARIMA and disaggregation models are given in Sections 4.2.7, 4.3.7, 5.2.8, 5.3.6, 6.2.6, 6.3.6, 6.3.7, 7.2.6, 7.3.6, and 8.8. The experience in using ‘time series analysis in hydrology shows that the judicious use of AR, ARMA, ARIMA and disaggregation models will generally produce satisfactory results for most practical cases of operational hydrology. This book is limited to these models. Other models are not included either because they are too complex for this book, or they are not well incorporated into current state-of-the-art practice. 1.7 TIME SERIES MODELING IN HYDROLOGY The exact mathematical models of hydrologic time series are never known. The inferred population models are only approximations. The exact model parameters are also never known in hydrology since they must be estimated from limited data. Estimations of models and their parameters from available data are often referred to in literature as time series modeling or stochastic modeling of hydrologic series. Although the development of time series modeling in hydrology has reached some degree of sophistication, unfortunately most time series modeling in practice is still generally based on the simple methods. This usually involves the selection of a time series model (usually an AR model) in advance, and the estimation of its parameters by the method of moments without verifying their constraints, without testing the goodness of fit of the model, verifying or checking it against competing models. It seems that one of the main reasons for this situation is that although many statistical books are available with the above-mentioned techniques, and many research papers are published in several journals (Water Resources Research, Journal of Hydrology, Hyd. Div. ASCE, etc.), they appear to be out of the reach of most practicing engineers due to the language barrier and the mathematical complexity used. Another reasOn is that the available books, monographs and handbooks in hydrology which include stochastic modeling, are either too cumbersome to read or too oversimplified. Therefore, the practitioners have two choices. They either do not follow published literature because it was not written on their level, or else a very simplified and limited modeling approach is followed. This book intends to close the gap of research and practice by describing up-to-date advances in modeling in a systematic step-by- step approach, including various details examples of modeling hydrologic time series. A systematic approach to hydrologic time series modeling may be composed of six main phases (Salas and Smith, 1980b): (1) identification of model composition, (2) selection 12of model type, (3) identification of model form, (4) estimation of model parameters, (5) testing goodness of fit of the model, and (6) evaluation of uncertainties. Figure 1.3 illustrates graphically the interaction between the above six modeling phases. In general, in any modeling of hydrologic time series, one has to decide whether the model will be a univariate or multivariate model, or a combination of a univariate and a disaggregation models, or a combination of a multivariate and a disaggregation models, etc. This decision is referred herein as the identification of the model composition. Such identification generally depends on the characteristics of the overall water resource system, the characteristics of the hydrologic time series, and the modeler's input. Vohaactarieties Vodelex I of the Overail Knowledge, Experi ater Rerources oENTIF TATION aces Bias, System OF nODEL Einsteins conposition J 1 Character SELECTION OF § of lydologic. | ——a} Jad [characteristics “Tine Series MODEL TYPE ‘of Hydrologic “ Physical Procestes {LL ewrsercation oF _—— | soner ror ESTIMATION OF [over ParwrereRs y TESTING CoouNESS | OF FIT OF 14 MoUEL EVALUATION OF CERTAINTIES. Figure 1.3, Systematic approach of hydrologic time series modeling (Salas and Smith, 1980b) For instance, to analyze the operation of a reservoir by simulation, monthly inflows to such reservoir must be generated. If there is no cther upstream reservoir or structures 13that may affect the operation of such reservoir, the univariate modeling of monthly streamflows at or near the site of the dam should be selected. On the other hand, if other reservoirs exist or are planned upstream from the reservoir under study, the multivariate modeling of monthly streamflows at various sites should be the choice. However, instead of multivariate modeling of monthly streamflows, the modeler may select the multivariate modeling of annual series and then use the disaggregation model to obtain the corresponding monthly flows. The above decisions are contingent on the availability of adequate data in the system under study, as well as on their statistical characteristics. For instance, two time series which show significant cross correlation will require a bivariate modeling, but if the cross correlation is not significant the two times series can be modeled independently by univariate models Once the model composition is identified, the type of the model(s) must be selected. Namely, the modeler has to decide on one among the various alternative models, say AR (autoregressive), ARMA (autoregressive moving average), ARIMA (autoregressive integrated. moving average), FGN (Fractional Gaussian Noise), BY (Broken Line), SL. (Shifting Level) or any other model that is available in stochastic hydrology. In this decision, three factors are important: the characteristics of hydrologic physical processes, the characteristics of hydrologic time series, and the modeler’s input. In this book we have already restricted the several models available into AR, ARMA, ARIMA, and disaggregation models. Even though ARIMA models include both AR and ARMA models, we have purposely separated them because for practitioners it is easier to Whderstand and to deal with AR models, than with ARMA and finally with ARIMA models The physical characteristics of hydrologic processes help in the selection of the types or the alternative types of models. In fact, we selected the AR and ARMA models because of the physical reasons discussed in Sec. 1.4 The statistical characteristics of the samples of hydrologic series are important deciding factors in the selection of the type of model. For instance, series with low decaying correlograms (long memory) generally require ARMA models rather than AR models. Monthly series, whose annual series has slow decaying correlograms generally require a two level modeling, an ARMA model for the annual series and the disaggregation model to obtain the monthly series. The model type selection ultimately depends on the modeler. In reality this is probably one of the most important factors. The modeler's knowledge of the advantages and limitations of the various types of models will enable him to make better decisions and to have a choice. On the other hand, if he only Knows about AR models, then he is bound to choose that 14model whether it is appropriate or not for the particular problem at hand. The modeler's experience helps substan- tially. Experience will tell when a model can be used or should not be used. A common problem with most modelers is personal bias. This is an unfortunate situation, although perhaps unavoidable. Personal biases are especially important with those (usually researchers) who were directly or indirectly involved with the development of a given type of model or are "aligned" with it. Some think that AR models are the answer for everything, while others think the same about ARMA models. Some others think that FGN or BL models are the last thing in modeling while others believe that SL models will handle all cases. There are, of course, some exceptions as “non-aligned or third world" modelers who are willing to look at more than one side. Finally, the modeler's limitations such as time available to solve a particular problem, funds for computer time, and availability of ready made programs of alternative models, also contribute in the decision of selecting the type of model (Salas and Smith, 1980b) Once the type of model is selected, the third phase of the modeling is to identify the form of the model. This identification as implied herein goes beyond determining the orders p and q, say of an ARMA model as in the Box- Jenkins approach. For instance, in time series analysis of weekly streamflows, it is necessary to identify whether the series is skewed and if such skewness is consiant or period~ ic, whether the week-to-week correlation coefficients are periodic, and whether the periodic characteristics should be described by the Fourier series, in addition to identifying the order say of an ARMA model. |The statistical characteristics of the historic time series are important for such model identification. In this case the knowledge and experience of the modeler also plays a significant role. Once the model is identified, the_estimation of the parameters of the model is made. The proper method of estimation should be selected The method of moments and the (approximate) method of xximum likelihood are the two methods usually available Generally, the. latter method gives the best estimators. In any case, the estimated parameters must comply with certain conditions of the model which should be checked. If these conditions are not met, an alternative form of the model is required. The model estimated in phase (4) needs to be checked in order to verify whether it complies with certain assumptions about the model and to verify how well it represents the historical hydrologic time series. The model assumptions to be checked are usually the independence and normality of residuals of the model (for instance, the series £, of Eq. 1-2).In addition, comparisons based on correlograms can be made to see if the model correlogram resembles the historical correlogram. Further comparisons, based on data generation, can be made which helps to verify whether the model reproduces statistically historical statistics such as the means, variances, skewness, correlations, storage related statistics, drought related statistics, etc. If the above checks and comparisons are not satisfactory, the model form or even the model type should be changed and the procedure repeated until a satisfactory model is found Once the model is judged to be adequate, it remains to evaluate the corresponding uncertainties. Two kinds of uncertainties, gre usually encountered in hydrologic time series analysis: \(a) model uncertainty, and Srameser—DACRT tainty. Model uncertainty results Because ee Grae modelo hydrologic time series are not known and at best the identified model composition, and the selected type and form of the model are only close approximations. Parameter uncertainty results because the model parameters are estimated from a limited amount-ofdata. Model uncertainty may be evaluated by testing whether significant differences in the statistics generated by alternative models exist. Parameters uncertainty may be determined by finding the distribution of parameter estimates, and by using the models with parameters sampled from such distributions. Other chapters of this book discuss in more detail the problem of parameter uncertainty for the AR, ARMA and ARIMA models 1.8 APPLICABILITY OF TIME SERIES MODELING IN HYDROLOGY Time series modeling has mainly two uses in hydrology and water resources: (1) for generation of synthetic hydrologic time series, and (2) for forecasting future hydrologic series. Generation of synthetic series are generally needed for reservoir sizing, for determining the risk of failure (or reliability) of water supply for irrigation systems, for determining the risk of failure of dependable capacities of hydro- electric systems, for planning studies of future reservoir operation, for planning capacity expansion of water supply systems, and similar applications. Forecast of hydrologic series are generally needed for short term planning of reservoir operation, for real time and short term operations of river basins or systems, for planning operation during an ongoing drought, and similar applications. Additional discussion on modeling applications are given in Secs. 4.2.8, 4.3.8, 5.2.9, 5.3.7, 6.2.7, 6.3.7, 6.3.8, 7.2.7, 7.3.7, 8.9 and 9.3. 16APPENDIX Al.1_ DEFINITIONS, TERMS AND NOTATION Some definitions and terms are given herein so that the reader has less difficulty in following the material presented in this book. Also the notation has been carefully selected to make the presentation of equations and models clear and following as much as feasible, the standard notation commonly encountered in published literature. DEFINITIONS AND TERMS Normalization The operation by which a time series is transformed into normal Standardization The operation by which a time series with a given mean and standard deviation is converted into a series with mean zero and standard deviation one. Independent series Time series which does not have any dependence in time or in space. Independent stochastic Time series which is independent in component time and identically distributed. White noise Same as independent stochastic component but normally distributed. Periodic series Time series with periodic components or periodic parameters Seasonal series Time series with time intervals that are a fraction of the year. Historical series Time series measured in the past Original series or Series or data available before any data analysis is made. AR model Autoregressive model ARMA model Autoregressive moving average model ARIMA model Autoregressive integrated moving average model Empirical distribution | Frequency distribution of data (no reference to any certain probability distribution function). 17Normal (0,1) Population Sample Estimate NOTATION Item Population parameter Estimated parameter Normal transformation Inverse normal transformation Logarithms Original data Normalized data Standardized data Stochastic component Variance Standard deviation Covariance Covariance matrix of parameters 6 and 6 Normal distribution with mean zero and variance one. Theoretical, true or known distribution, parameter or __ statistical property. Observed, assumed or generated data of a limited size. Distribution, parameter or any statistical property estimated from a sample Description Ic. Greek letter and Roman letters Example: $, 6, b, A lc. Greek and Roman letters with caret. Example: 6, 6, 6, A, ¥ Y = g(X). Example: Y = log(X) X= gl(¥), Example: X = log” (v)= antilog(Y) log ,(X), In(X) : Base e log 9(X) Base 10 log(X) : Any base X, x Yoy Zaz &, » $, oF independent series Var(X), 0”, s, Cy os Cov(X,¥), Syy V@, 0) 18Autocovariance function co, Autocorrelation function ACF Population autocorre- py, P,(2), ation function PIO Pea Sample autocorrelation ry, T(x), Ty function Partial autocorrelation 4,(k), PACF function Spectral density g(f) function Univariate series er BE x, Multivariate series Xe et Generated series Bp X A converges to B AoB R proportional to n Rwn A equivalent to B AB Inverse of matrix A at Transpose of matrix A AT Derivative of $ 3 + %S respect to APPENDIX Al.2. ELEMENTARY STATISTICAL PRINCIPLES Elementary statistical principles used in various chapters of this book are briefly reviewed herein. The purpose is a handy access to some basic definitions and properties. For a formal presentation on this subject the reader is referred to standard textbooks on probability and statistics such as those by Benjamin and Cornell (1970), or Mood et al. (1973). Random Variable It is a variable whose outcomes (values) are governed by chance, Its values can not be predicted with certainty but only in probability terms. Random variables can be discrete or continuous. Discrete random variables take on values only at discrete (specified) points, while continuous random 19variables can take on any value on the real axis or any value between two boundaries values Probability Distribution Function It is a function that defines the probability associated with a random variable. It is also called the probability law of a random variable. For instance, if a random variable X can take on only the values 0 and 1 with probabilities 0.3 and 0.7, respectively, then the probability distribution function or probability law is p(X=0) = 0.3 and P(X=1) = 0.7. The probability distribution function of discrete random variables may be represented by: the probability mass function (PMF) and the cumulative distribution function (CDF), Fer instance, for the discrete random variable X=1, X=2, X=3 and X=4, the PMF may be P(X=1) = 0.2, PC = 0.35, P(X=3) 0.25 and P(X=4) = 0.2. The corresponding CDF is P(Xs1) 0.20, P(X<2) = 0.20 + 0.35 = 0.55, P(X<3) = 0.20 + 0.35 + 0.25 = 0.80 and P(X<4) = 0.20 + + 0.25 + 0.20 = 1.00 These two functions are plotted in Fig. Al.1 Similarly, the probability distribution function of a continuous random variable is represented by: the probability geP(xex) 038 02s o2k 2 0.20 7 x ° 1 2 3 4 = A F(x) = P(X=x) 1.0- — 0.20 08 = 0.28 O.6F 04h 03s Figure Al.1. PMF and CDF of a discrete random variable 20density function (PDF) and the cumulative distribution function (CDF). ‘The PDF of a random variable X, usually denoted by f(x), serves to determine probabilities by integration. That is, the probability of X to be between x, and X2 Xz is obtained by P(xi0, > 0, [ag, ag agg] > 0, ete. aa eee 31 832 33 (AL.15) 26A is positive semidefinite if the inequalities in Eq. (A1.15) are replaced by > signs REFERENCES Barnes, F. B., 1954. Storage required for a city water supply. J. Inst. Eng., Australia, 26, pp. 198-203. Benjamin, J. R. and Cornell, C. A., 1970. Probability, Statistics and Decision for Civil Engineers. McGraw-Hill Book Co., New York. Boes, D. C. and Salas, J. D., 1978. Nonstationarity in the mean and the Hurst phenomenon. Jour. Water Resour. Res., 14, 1, pp. 135-143 Box, G. E. P. and Jenkins, G., 1970. Time Series Analysis, Forecasting and Control, San Francisco, Holden-Day. Carlson, R. F., MacCormick, A. J. A., and Watts, D. G., 1970 Application of linear models to four annual streamflow series. Jour. Water Resour. Res., 6, 4, pp 1070-1078. Chiu, C. L., 1972. Stochastic methods in hydraulics and hydrology of streamflow. Geophysical Surveys, 1, pp 61-84. Crawford, N. and Linsley, R. K., 1966. Digital simulation in hydrology: Stanford Watershed Model IV. Technical Report 39, Stanford University, Stanford, California. Dawdy, D. R., Gupta, V. and Singh, V., 1978. Stochastic simulation of droughts. Paper presented at the U.S. - Argentinian Workshop on Droughts, Mar del Plata, Argentina, December. Delleur, J. W., Tao, P. C. and Kavvas, M. L., 1976. An evaluation of the practicality and complexity of some rainfall and runoff time series models. Jour. Water Resour. Res., 12, 5, pp. 953-970 Fiering, M. B., 1967. Streamflow Syntehsis. Harvard University Press, Cambridge, Massachusetts Harley, B. M., Perkins, F. E. and Eagleson, P. S., 1970. A modular distributed model of catchment dynamics. Ralph M. Parsons Laboratory Report 133, Dept. of Civil Eng., MIT., Cambridge, Mass 27Hazen, A., 1914. Storage to be provided in impounding reservoirs for municipal water supply. Trans. Amer. Soc. Civil Eng., 77, pp. 1539-1669 Hipel, K. W. and McLeod, A. I., 1978. Preservation of the rescaled adjusted range, 2. Simulation studies using Box-Jenskins models. Jour. Water Resour. Res., 14, 3, pp. 509-516 Hurst, H. E., 1951. Long term storage capacity of reservoirs.’ Trans. Amer. Soc. Civil Engrs., 116, pp. 710-799 Jackson, B. B., 1974a. The use of streamflow models in planning. Jour. Water Resour. Res., 11, 1, pp. 54-63. Jackson, B. B., 1975b. Markov mixture models for drought lengths. Water Resour. Res., 11, 1, pp. 64-74 Kelman, J., 1977. Stochastic modelling of hydrologic intermittent daily process. Hydrology Paper 89 Colorado State University, Fort Collins, Colorado. Klemes, V., 1972. Comments on "Adequacy of markovian models with cyclic components for stochastic streamflow simulation", by I. Rodriguez-Iturbe, David R. Dawdy and Luis E. Garcia. Jour. Water Resour. Res., 8, 6, pp. 1613-1615. Klemes, V., 1973. Watershed as __ seminfinite storage reservoir. ASCE Jour. Irrig. and Drain. Div., 99, IR4, pp. 477-491 Klemes, V., 1974, The Hurst phenomenon--a puzzle. Jour. Water Resour. Res., 10, 4, pp. 875-688 Lawrence, A. J. and Kottegoda, N. T., 1977. Stochastic modeling of riverflow time series. Jour. Royal Stat Soc., A, 140, pp. 1-47. Lettenmaier, D. P. and Burges, S. J., 1977. Operational assessment of hydrologic models of __ long-term persistence. Jour, Water Resour. Res., 13, 1, pp 113-124, Mandelbrot, B. B. and Wallis, J. R., 1968. Noah, Joseph and operational hydrology. Jour. Water Resour. Res., 4, 5, pp. 909-918. Matalas, N. C., 1967. Mathematical assessment of synthetic hydrology. Jour. Water Resour. Res., 3, 4, pp. 937-945 28Matalas, N. C. and Wallis, J. R., 1971. Statistical properties of multivariate fractional noise processes. Jour. Water Resour. Res., 7, 6, pp. 1460-1468 McLeod, A. I. and Hipel, K. W., 1978. Preservation of the rescaled adjusted range, 1. A reassessment of the Hurst phenomenon. Jour. Water Resour. Res., 14, 3, pp. 491-508. Mejia, J. M., 1971, On the generation of multivariate sequences exhibiting the Hurst phenomenon and some flood frequency analyses. Ph.D. Dissertation, Colorado State University, Fort Collins, Colorado. Mood, A. M., Graybill, F. A. and Boes, D. C., 1974 Introduction to the Theory of Statistics. Third Edition, McGraw-Hill, New York. Moss, M. E. and Bryson, M. C., 1974. Autocorrelation structure of monthly streamflows. Jour. Water Res. Res., 10, 4, pp. 737-744. O'Connell, P. E., 1971. A simple stochastic modeling of Hurst's law. In Mathematical Models in Hydrology, Warsaw Symposium, (IAHS Publ. 100, 1974), 1, pp. 169-187 O'Connor, K. M., 1976. A discrete linear cascade model for hydrology. Jour. Hydrology, 29, pp. 203-242 Pegram, G. G. S., 1978. Physical justification of continuous streamflow model. In Modeling Hydrologic Processes, Proceedings of the Fort Collins II International Hydrology Symposium, Edited by H. J. Morel-Seytoux, J.D. Salas, T. G. Sanders and R. E. Smith, pp 270-280. Quimpo, R. 1971. Structural relation between parametric and stochastic hydrology models. In Mathematical Models in Hydrology, Warsaw Symposium, (IASH Publ. 100, 1974), 1, pp. 151-157. Rodriguez-Iturbe, I., Mejia, J. M., and Dawdy, D. R., 1973. Streamflow simulation. 1. A new look at markovian models, fractional Gaussian noise and crossing theory Jour. Water Resour. Res., 8, 4, pp. 921-930. Salas, J. D., Boes, D. C., Yevjevich, V. and Pegram, G. G S., 1979. Hurst phenomenon as a pre-asymptotic behavior. Jour. Hydrology, 44, pp. 1-15 29Salas, J. D. and Smith, R. A., 1980a, Physical basis of stochastic models of annual flows. Paper accepted for publication in the Jour. Water Resour. Res Salas, J. D. and Smith, R. A., 1980b. Uncertainties in hydrologic time series analysis. Paper presented at the ASCE Spring Meeting, Portland, Oregon, Preprint 80-158 Selvalingam, S., 1978. ARMA and Linear tank models. In Modeling Hydrologic Processes, Proceedings of the Fort Collins III International Hydrology Symposium, Edited by H. J. Morel-Seytoux, J. D. Salas, T. G. Sanders and R. E. Smith, pp. 297-313. Sudler, C. E., 1927. Storage required for the regulation of streamflow. Trans. Amer. Soc. Civil Eng., 91, pp 622-660. Thomas, H. A. and Fiering, M. B., 1962. Mathematical synthesis of streamflow sequences for the analysis of river basins by simulation. In Design of Water Resource Systems (A. Mass et al., eds.), pp. 459-493, Cam- bridge, Massachusetts, Harvard University Press. Thomas, H. A., 1965. Personal communication to M. B Fiering. Examination questions for Engineering 250, Harvard University, Cambridge, Massachusetts. Valencia, D., and Schaake, J. C., 1973. Disaggregation process in stochastic hydrology. Jour. Water Resour. Res., 9, 3, pp. 580-585. Weiss, G., 1973. Filtered poisson processes as models for daily streamflow data. Ph.D. Thesis, Mathematics De- partment, Imperial College, London, quoted in "Flood Studies Report," Vol. I, 1975, Natural Environmental Research Council, London. Yakowitz, 8. J., 1973. A stochastic model for daily riverflow in an arid region. Jour, Water Resour. Res., 9, 5, pp 1271-1285. Yevjevich, V. M., 1963. Fluctuations of wet and dry years Part’ 1. Research data assembly and mathematical models. Hydrology Paper 1, Colorado State University, Fort Collins, Colorado. Yevjevich, V., 1965. The application of surplus, deficit and range in hydrology. Hydrology Paper 10, Colorado State University, Fort Collins, Colorado 30Chapter 2 CHARACTERISTICS OF HYDROLOGIC SERIES The main purpose of this chapter is to describe the basic and complex statistical characteristics of hydrologic series most commonly encountered in practice. The several types of hydrologic series are first discussed in Sec. 2.1. Section 2.2 describes the general properties of hydrologic time series and gives the equations to compute such properties. Sections 2.3 and 2.4 discuss the characteristics of annual and periodic univariate time series, respectively, while Sec. 2.5 deals with multivariate time series. Finally, the characteristics of intermittent time series are briefly discussed in Sec. 2.6 2.1. TYPES OF HYDROLOGIC SERIES The types of series most commonly encountered in hydrology are basically unidimensional (univariate) in time, line or counting series, and multidimensional (multipoint, multisite or multivariate) in time, line or counting series. __-Whenever_the reference is._time the series—is usually..called "time_series' 2.1.1 TIME SERIES Hydrologic time series can be divided in two basic groups: (1). single time-series at a specified point, and (2) -multiple. time series at several points or multiple series of different kinds at one point. Single time series are also called univariate series while multiple time series are called multisite, multipoint or multivariate time series. In any case they constitute sets of mutually related time series of individual points along a line,” over an area, across a space, or as sets of time series of mutually related variables of various kinds. Examples of single time series are annual precipitation or annual streamflows at a gaging station, monthly precipitation or monthly streamflow at a gaging station, average annual or monthly precipitation over an area, aggregated annual or monthly streamflow for a watershed system, etc. Examples of multiple time series are water quantity and related water quality variations in time, the series of annual or monthly precipitation at various gaging stations, the series of annual or monthly streamflow at various points of a river, variables that change over a river cross section for a given time, etc Single and multiple time series are further distinguished according to the time interval used, because the general characteristics, the methods of modeling and the estimation of parameters are related in various ways to the selected time 31interval. Basically, time intervals determine the following types of time series: 1, Continuous time series for which variables are continuously recorded (time interval zero) 2. Time series of intervals that are a fraction of the day, such as the hourly, 2-hour, 6-hour, 12-hour, etc. (examples are series of short intervals rainfall), that exhibit the daily and the annual cycle in their basic statistical characteristics in addition to random variations 3. Time series of intervals that are fractions of the year, such as day, week, month, season, or their multiples, with the annual cycle in their ‘statistical characteristics in addition to random variations 4. Annual time series which by the summation or integration over the year do not exhibit cycles. Experience shows that the estimation of models and parameters become simpler to perform as series go from the continuous or very short interval time series to large interval time series, and the simplest analysis is for the series with the time interval of the year 2.1.2 LINE SERIES The line series may be also divided in two basic groups: 1. single line series, such as those representing a random process of a property of river channels along the channel axis, or a porous media property along a well or a drill hole; and re multiple cross sectional lines series, obtained when several line series define the stochastic properties of an area or a space. 2.1.3 COUNTING SERIES The counting series represent the result of sequential counting of the number of occurrences of 2 particular type, as random events in intervals of time, along a line, across an area or over a space. They can be either Univariate or muitivariate series. For instance a counting series result from the number of rainy days during each month for a particular site or area 322.2 GENERAL PROPERTIES OF HYDROLOGIC TIME SERIES Univariate series are generally described by estimating their statistical characteristics such as the mean, standard deviation, skewness, probability distribution and time dependence structure. On the other hand multivariate series requires, besides the characteristics of each individual series, the estimation of the interrelationships among the series. Both single and multiple series (time, line or counting) are basically studied as discrete series of various intervals. The analysis of series in time is the main subject of this text, although some principles and techniques discussed may be applied to the line and counting hydrologic series. Apart from a verbal description of the characteristics of common hydrologic series, the standard statistical calculations of some properties are given in this chapter. 2.2.1 COMPONENTS OF HYDROLOGIC SERIES Hydrologic time series are often represented by components such as 1. Overyear trends and other deterministic changes (such as slippages or jumps in parameters), 2. Cycles or periodic changes of the day and the year 3. The almost periodic changes such as tidal effects on hydrologic time series. 4, Components that represent the stochastic or random variations. These four components can be explained and defined in various ways as shown below Inconsistency (systematic errors) and nonhomogéneity (changes in nature by humans or by natural disruptive, evolutive or sudden processes) are mainly responsible for the overyear trends or for sudden changes (jumps, slippages) To properly determine the series characteristics, inconsistency and nonhomogeneity must be firstly identified and removed, because they are not expected to continue in the same form. They may either continue differently or may not continue at all in the future. The study of the operation of gaging stations with the changes at and around them, and the study of various environmental changes in river basins, should always support the statistically detected trends and jumps. 33,Apparent trends and cycles are often results of chance, namely of sampling fluctuations in a given time series. They are called here apparent, because they may seem as such for short samples, but in reality they are just part of the sampling variations when are considered for larger sample sizes. The apparent overyear trends and cycles are not population properties of hydrologic series, provided the known or inferred nonhomogeneities and inconsistencies in data are properly removed. The apparent trends and cycles must be tested to be statistically significant and physically justified, before considering them as population characteristics of the time series. Reliable inferential statistical techniques should be applied for testing the significance of such characteristics Astronomic cycles are the basic causes of periodicity and almost periodicity in the characteristics of hydrologic time series. Periodicity means that statistical characteristics change periodically with the year. For some examples of periodicity in the mean, standard deviation, skewness coefficient and daily correlation, see Figures 2.1 through 2.4 respectively, for the daily flows of the Boise River, Idaho, USA. Tidal processes have harmonics (frequencies) that are induced jointly by various astronomical cycles (day, lunar month, Solar year). They are almost periodic processes, because of noncomensurability of frequencies of all the harmonic variations involved (ratios of these three frequencies are not the rational numbers). The almost periodic processes never repeat themselves identically as the periodic processes do. Astronomic cycles of the day and the year are present in all the hydrologic time series of time intervals that are fractions of the day or the year. Hydrologic variables affected or dependent on tidal processes along coastal areas, have time series with characteristics that follow the almost periodicities of tides 4909} 3000 2000 tooo} saays 7 aoe 50ST] wos (30 Sen Figure 2.1. Estimates X, of Eq. (2.7) of the mean of daily flows,‘ each resulting from 40 values of the 40-year long sampie of the Boise River near Twin Springs, Idaho, USA 34| ro00! 200] Ai Sele od se so BoD BO 8G] ioe (20) Estimates s, of Eq. (2.8) of the standard deviation of daily flows of the Boise River, Idaho, USA. (ioe [Seseen Figure 2.3. Estimates g, of Eq. (2.9) of the skewness coefficient of the daily flows of the Boise River, Idaho, USA. Turbulence, large scale vorticity, heat conversion, atmospheric opacity for incoming and’ outgoing radiation waves, random thermodynamic processes, and many other processes in the earth's environments, are responsible for randomness (stochasticity) in time series. These sources of randomness produce the variations in time series, referred to as the stochastic components. Storage of water and heat in earth's environments and some resulting smoothing effects, are factors that attenuate the periodic and almost periodic processes and produce the time dependence in the stochastic variation. Inputs to hydrologic environments are mostly a combination of periodic and stochastic variations that mutually interact. The earth's environments react to these inputs in three ways: (1) by smoothing or magnifying these inputs, (2) by adding, attenuating, amplifying or dampening some 35090 \ 0.8 g o7 3 0.60| | T days 0.50 ——} 4 30 | 100 180 200 280 300 350 (1oet) (30 Sept) Figure 2.4. Estimates r, , or Eq. (2.10) of the correlation coefficients of daily flows of the Boise River, Idaho, USA. harmonics that describe the periodic components, and (3) by adding or modifying randomness resulting from various environmental factors 2.2.2. BASIC STATISTICAL CHARACTERISTICS OF TIME SERIES The reader is referred to any classical statistical book for information on various characteristics of statistical nature Here, only the most common characteristics are defined The basic statistical characteristic of a time series x t=1, ..., N is the sample mean.x given by ie (2.1) Ms i Noe where N is the sample size (length of. a-hydrolodic time series) The sample mean x is the estimate of the population mean p (the expected value, expectation). It measures the central tendency of x, or determines where the series is located as a whole The second important statistical characteristic of a time Series is the sample variance s? given by 36(2.24) It is a biased estimate of the population variance (see Sec 3.1.1 for the definition of bias and related properties). The unbiased estimate is obtained by ae a2 s? = ayy 2 Cy, - ¥) (2.2b) tl The estimate of Eq. (2.2b) is the most generally used in statistical hydrology. The square root of —s*_ is called_the standard.deyiation. Related to the mean and standard deviation is the coefficient of variation y/o and x/s for the population and the sample, respectively, Just as the mean measures the location of a time series Xe the standard deviation measures the dispersion or the spread of the series around the mean xX. A small s means that the values x1, Xg> «++» %y do not defer much from xX, while a large s generally means that the x's have a large spread around x The sample skewness coefficient of a time series may be determined by . N 1 =)3 Ny 2 Oy - XD g == ___ (2.3) where s is obtained from Eq. (2.23). It is a biased estimate of the population skewness coefficient y. An approximate unbiased estimate of y is determined by N Nz (x, - #9 t=1 OFDO=2) ss (2.3b) with s obtained from Eq. (2.2b). The estimate of Eq. (2.3b) is the most generally used in hydrology. The coefficient of skewness measures the asymetry of a time series. If_ —8=0__the probability distribution of the series x, would-be , Symmetric centered around x. If g<0_the distribution. tionis skewed to the right while g>0 indicates a dist tion skewed to the left. 37The autocovariance function measures the degree of linear autodependence (self-dependence) of a time series. The autocovariance c, between x, and x, may be K t tk determined by N-k 7 7 EX - DO D, OSKSN zim where cy represents the time lag (or distance)between the correlated pairs (x,, X,,,)» X is the sample mean of Eq. (2.1) and N is usually called the lag-k autocovariance, k is the sample size. For the particular case that k=0, co becomes the variance s? of Eq. (2.2a). The sample autocovariance c, of Eq. (2.4) is a biased estimate of the population autocovariance function y,. An unbiased estimate can be obtained by using (N-k) instead of N in the denomi- nator of Eq. (2.4). In either case such estimators are referred as the open series estimators. Notice that these estimators have only N-k terms in the cross-products of Eq. (2.4). An approach that considers N cross-products terms is referred as the circular series approach. Appendix A8.1 give covariance estimators based on the circular series approach A dimensionless measure of linear dependence is obtained % by dividing of Eq. (2.4) by cg. Such operation gives x) Oxy > Xx, tek 7 (2.5a) 2 where Ty, is called the (lag-k) autocorrelation coefficient, the serial correlation coefficient or the autocorrelation function (ACF). The plot of T, versus k is generally called the correlogram. The sample autocorrelation coefficient T! is an estimate of the population coefficient p,. The most currently used simple measure of time dependence of a series is the first serial correlation coefficient r, or pi for the sample or the population, respectively. An alternative estimate of the autocorrelation function ey, is 38where X, is the mean of the first N-k values x. eee and X,,, is the mean of the last N-K values x,4)> *N-k +> X%y- Equations (2.5a) and (2.5b) give ry=1 for k-0 so the correlogram always starts at unity at the origin. In general -1 < ry < +1. The estimate ry, of Eq. (2.5b) is the maximum likelihood estimate of p, when ‘x,, X,4,) is bivariate normal (Jenkins and Watts, 1969). Although it is a good estimate of 6, When considered individually, that is not the case when several estimates say 11, r2, ... are needed. Furthermore, the estimates of Eq. (2.5b) are not positive definite (a positive definite autocorrelation matrix is an important property of stationary time series). On the other hand, the estimates rj of Eq. (2.5a) are positive definite. Both estaimtes ry of Eqs. (2.52) and (2.5b) are biased downward, namely “the average fT, of values r, computed from many series of size N arf not equal to thé population value py. The mean ty is smaller than p, , and the difference increases as the value , increase and sample size N decreases. Biased estimates of the population autocorrelation is a disadvantage in practice. Often it may induce the incorrect inference about the characteristics of a time series. The bias in the estimates rj, may be removed by using the Quenouille's corrections (Kendal and Stuart, 1968, p. 435) = 2 ty 7 0.5 [r,() + 1 (2)] (2.6) where 1 is the corrected "unbiased" serial correlation estimate, rj, is the original serial correlation estimate of either Eq. (2.5a) or (2.5b) and r,(1) and ry(2) are the serial correlation estimates of the first half and the second half of the time series. A disadvantage of Eq. (2.6) is that sometimes 1 takes on values beyond the limits (+1,-1) In addition to the basic statistical characteristics of a time series such as the mean, variance, skewness and 39autocorrelation, we may be interested on the overall probability distribution of x,, ..., xy. Such distribution may be determined either by the sample frequency distribution or by fitting a probability distribution function to the frequency distribution. Continuous distribution functions such as normal, lognormal, gamma and loggamma functions are generally used in practice. 2.2.3 COMPLEX CHARACTERISTICS OF PERIODIC TIME SERIES The statistical characteristics such as the mean, variance, skewness coefficient and serial correlation of periodic hydrologic time series can be determined by Eqs (2.1) through (2.5). However, such equations would only give the statistical characteristics as a whole and they will not show the effect of the annual cycle (except for the case of ¥,). In order to take into account such effect the characteristics must be determined for each time interval within the year Consider the periodic time series x, | where v denotes the year and t denotes the time interval within the year. The sample mean for the time interval t is determined by “ow (2.7) where N is the number of years of record and w is the number of time intervals in the year. The sample mean xX, is an estimate of the population mean 4. The sample variance for time t is given by eept Ta. t v * (2.8) NT 2, Aver It is an estimate of the population variance a Similarly, the sample skewness coefficient for time 1 is (2.9) 40where X, is given by Eq. (2.7) and s, is obtained from Eq. (2.8). The skewness coefficient g, of Eq. (2.9) is an estimate of the population skewness coefficient y,. Although Eqs. (2.8) and (2.9) are most generally used in statistical hydrology for estimating the variance and the skewness coefficient, for certain cases we may wish to determine the biased estimates applying Eqs. (2.2a) and (2.3a), respectively for each time period 1. ‘The correlation structure of the time series x, may be determined for each time interval t by M2 N 1 - - yn 2 Oy - EDOXy re 7 XD OO (2.10) 1-k where rj, | is the sample lag-k correlation coefficient which is an estimate of the population correlation coefficient py |. When t-k <1 in Eq. (2.10), Nis replaced by N-1, x 1 is replaced bY Xy.1 wera ANd Hj is replaced by Fyy7_y. The estimates %-, s?, g, and ry, | are generally called the periodic or seasonal statistical characteristics of the series yr While most analyses of hydrologic series with intervals that are fractions of the year, currently consider periodicities in the mean and in the standard deviation, and sometimes in the first serial correlation coefficient, the analysis of periodicities in the skewness coefficient is rarely undertaken. In essence, the periodic hydrologic time series represent from a rigorous statistical viewpoint a multivariate process (say, composed of a set of 365 variables for the case of daily flows) with a different probability distribution for each time interval of the year. The periodicity is induced by the annual cycle of revolution of the earth around the sun, with the corresponding seasonal changes likely to be present in all the characteristics of the hydrologic time series. Similarly, the series of intervals of 9 fraction of the day exhibit the daily cycle along with the annual cycle. 2.2.4 DROUGHT RELATED CHARACTERISTICS OF TIME SERIES In addition to the statistical characteristics discussed in previous sections, some other characteristics of time series 41are particularly related to droughts. Consider the time series Xj, ++) %y and a constant demand level y (crossing level) as shown in Fig. 2.5. A negative run occurs when x; is less than y consecutively during one or more time intervals. Similarly, a positive run occurs when x, is consecutively greater than y. We concentrate only on ‘negative runs since they are related to drought characteristics. ° 5 10 15 N Figure 2.5. Definitions of run-length and run-sum. A run can be determined by its length, its sum or its intensity. For instance, a negative run of length 4 and run- sum equal to d are shown in Fig. 2.5. In general, several runs result in a time series of given demand level and sample size. Assume that M runs of run-lengths 2(1), ..., 2(M) and run-sums (1), ..., d(M) occur. The means, the standard deviation and the maximum of run-length and run- sum are important characteristics describing the runs of a given time series. For instance, for the run-length such characteristics are obtained by M z 2(j) (2.11) Sy (2) = 2 aa - ip?) (2.12) nN’ WT 2 N ee = max (201), ..., 2). (2.13) The histogram of the 2's is also important for describing the overall distribution of runs, Notice that these characteristics are defined for a given sample size N and for a given demand level y., As N_ increases and/or y increases, dy, sy(2) and “2 also increase. * It must be noted that for given N and y, the above fs * characteristics Zy, sy(2) and 2, are random variables 42For instance, if for a given sample x1, ..., xy of size = 100 years and demand y = 0.8 x, with x the mean of the x's, Eq. (2.13) gives 2}o9 = 5 years, it does not mean that too will be also 5 years for another sample of the same size and the same demand. It may be more or less than 8 years. The above run definitions and concepts can be used for both the annual series and the periodic series. In the case of periodic series, the demand level may be also periodic, . * The run length characteristics 2y, sy(2) and fy may be used for comparison with the corresponding characteristics derived from mathematical models fitted to historical series. 2.2.5 STORAGE RELATED CHARACTERISTICS OF TIME SERIES Some characteristics of time series are particularly related to reservoir storage problems. Since such characteristics are functions of the dependence structure of a series, they are also useful for identifying the degree of time dependence of a series. Consider a time series x,, ..., Xj and form the sequence iy i aN (2.14) where , is called the partial sums and Xy is the sample mean detdrmined by N Bx; (2.15) i=1 Define the sample standard deviation by N -{1 ~ x)2)8 ou [f 3% a (2.16) and the range (rescaled range) of partial sums by =, max (So, Si, Se, - ‘N min (So, S$), Sy, ... (2.17) with So = 0. Recall that the plot of S; versus i, i= 1, | Nis the typical mass-curve or Ripple diagram, from which the minimum storage capacity of a reservoir to deliver 43,ky throughout the time interval N, can be obtained. Since Ry of Eq. (2.17) is related to the minimum storage capacity needed, it may be a useful statistic for testing whether a model represents the "storage characteristics" of the historical time series. Assume that several samples of size N are available, from which the mean Ry is obtained. The mean range Ry yo With ON may indicate whether the series is of short memory, long memory or infinite memory. Suppose that Ry versus Nis increases with N. The type of variation of R, plotted in a log-log paper and a straight line is fitted h through the points. Then Ry~NN where hy is the slope of the fitted line. The slope hy varies with N, but as N>e either hy >% or hy >h#%. We consider that a time series is of short memory if hy > fairly fast, while it is of long memory if hy *% slowly. On the other hand, the time series is of infinite memory if hy >h#%. These concepts apply to hydrologic series or to series derived from known stochastic models. 2.2.6 NONHOMOGENEITY AND INCONSISTENCY IN HYDROLOGIC SERIES Hydrologic characteristics are generally subject to changes due to nonhomogeneity and inconsistency. Nonhomo- geneity in data is common in hydrologic time series; it is induced by humans or produced by significant natural disruptive factors, evolutive or sudden (such as natural disasters) In addition, hydrologic data may have significant systematic errors producing inconsistent series. Several characteristics of time series such as the mean, standard deviation and serial correlations may be affected whenever a trend and/or a posi tive or negative jump (slippage) are produced in hydrologic series by nonhomogeneity and inconsistency. The identification or detection, description and removal of nonhomogeneity and inconsistency are important aspects of time series analysis. They are most reliable if changes are substantiated by both statistical tests and physical or historical evidence and justification. The identification and description of the characteristics of changes in hydrologic time series (because of inconsistency and nonhomogeneity) are based on: (1) fitting a trend func~ tion and testing that its parameters are significantly different from zero;. and (2) testing that the basic statistical 44characteristics of subseries of the sample series are statistically different among themselves. The trend analysis assumes @ monotonic function expanded in power series form as 7 2 m Xpiby +d, t+ bot +... +b (2.18) where bp, bj, ..., b,, are the parameters to be estimated Only when any of the parameters bi, ..., b,, are found to be significantly different from zero, with bg™a constant or zero, then a linear or a nonlinear trend becomes a characteristic of the series. The verification process of the trend should ascertain whether the history of data collection (various sources of systematic errors), and the history of river basin developments or the other natural factors (various sources of nonhomogeneity), may support and justify the final acceptance of the inferred trend. The second technique divides the historic time series into two or more subseries and the main statistical characteristics are estimated for each subseries. The breaking points of subseries should be the times of hypothesized change of the characteristics, say the times of change of observational technique, times of the start of various projects that change the water regime, or the time in which a sudden change may exist in the mean, in the standard deviation and correlation of the series. Various techniques are available for the comparison of statistical characteristics of subseries and for testing whether they are or they are not statistically different among themselves. Figure 2.6 shows the inferred trend in the annual flow series of the Colorado River at Lee Ferry, Arizona. To illustrate the method of determining whether a change exists as the basic characteristic of a time series, the example of the time series of the net basin supply (NBS) into the Great Lakes is given in some details. This example tests homogeneity by splitting the sample into two unequal subsamples and testing whether differences between the means of the subsamples are significantly different from zero on the 95 percent significance level. Only if the probability is less than 5 percent for a difference to be by chance, the sample is considered to be nonhomogeneous. The five mean annual NBS of Lakes Ontario, Erie, Michigan-Huron, Superior and St. Clair are used for homogeneity test. The classical t-statistic is used for testing whether the difference of two means x; and x2 is significant, that is 45Figure 2.6. Annual flow series of the Colorado River at Lee Ferry, Arizona, 1896-1959: (1) historical annual flow record, with 1896-1921 period data estimated from other stations by correlation, (2) the arithmetic mean for the period 1896-1959 (64 years). (3) the arithmetic mean for the historic observed period 1922-1959, and (4) an approximate trend in the period 1922-1959 (After Yevjevich, 1972.) (2.19) with (2.20) where N, and Ny are the subsample sizes, x; is the sample values in the N, subsample and x, in the Ny subsample. The variable t of Eq. (2.19) follows the Student t-distribution with (N, + No - 2) degrees of freedom. The critical value t, for the 95 percent significance probability level is taken from the Student t-tables. Similar equations as above, or equations based on the F-test, can be used for testing whether the variances s{ and s3 of subsamples are significantly different 46Table 2.1 gives results of homogeneity test for subsample means, with only the series of annual NBS of Lake St. Clair found to be nonhomogeneous. The means of the subsamples are xX, = 3.06 and X2 = 5.24. Then the NBS series for Lake St. Clair may be corrected with x = 5.24 of the last 26 years of record also used as the mean for the first 43 years. Table 2.1. Example of Testing for Changes in the Mean of the Annual NBS of the Great Lakes System. Subsample Sizes Statistic (95%) Change From in Lake N, Nz t-Tables Computed the Mean (95%) Ontario 36 33 2.0 0.299 No Erie 36-33 2.0 0.635 No Superior 36 33 2.0 1.525 No Michigan 36 33 2.0 0.866 No St. Clair 43 26 2.0 4.477 Yes The test of homogeneity in the standard deviation of the NBS series showed all five series to be homogeneous in the standard deviation. Under the natural conditions, when a hydrologic series has no significant trend or jump in the mean and the standard deviation, usually the other statistical characteristics such as the skewness and correlations do not show significant changes either. 2.3 CHARACTERISTICS OF ANNUAL TIME SERIES Annual time series are the simplest series in hydrology as it concerns their general statistical characteristics, For instance, the precipitation observed in 1141 stations in the United States showed that the annual precipitation time series are very close to being time independent, stationary stochastic processes for a period of the most reliable data (last 70- 100 years). The independence means that the outcome of precipitation in a year does not depend on precipitation values of previous years. The stationarity means that the basic properties of a process do not change with the absolute time. The time dependence measured by the first serial correlation coefficient r; of annual precipitation series computed 47by Eq. (2.5b) and averaged over the total number of series (i141) is on the order of 0.028, for the period of simultane- ous observations of 30 years (1931-1960). For all years of observations available at these stations, with the average ser- igs length of 54 years, the mean r; value is 0.055 (Yevjevich, 1964). These small values imply that on the average, less than one percent of the total variation of annual precipitation in any year may be due to the annual precipitation which has occurred in the previous years. Therefore, the annual precipitation can be considered in most cases as an independent time series for the time spans of decades. Figure 2.7 shows the series of annual precipitation at Fort Collins, Colorado, USA, for 92 years (1887-1978). The first serial correlation coefficient for this series is r, = -0.20. Such negative value of r; may be considered as a large sampling deviation from the population value p; = 0 for independent series Figure 2.7. Annual precipitation series at Fort Collins, Colorado, for the period 1887-1978 (92 years), ‘in modular coefficients P,/P with P, =the annual values, and P = 14.57 inches = the annual mean Annual runoff series are either time independent or time dependent stochastic processes. When negligible changes in the total stored water of a river basin occur at the end of each water year, the series are independent. They are dependent when the storage at the end of the year has rela~ tively large fluctuation in comparison with the average annual flow. Large variations in total water carryovers in river basins from year to year may be considered as the principal physical factor that affects the time dependence of both the annual runoff and the annual evaporation. The average of first serial correlation coefficients for 140 worldwide selected annual runoff series, with the mean series length of 55 years, gave r, = 0.175. Similarly, the average for 446 annual runoff series in western North America, with the mean series length of 87 years, gave rT, = 0.197 (Yevjevich, 1964). Both sets of data showed that the average first serial 48correlation coefficient of annual runoff series is about i, = 0.20 which is statistically different from zero. Figure 2.8 shows the annual runoff series of the Danube River at Orsova, Romania, for 120 years (1937-1957), as an example of a somewhat dependent annual series (approximately ry = 0.10). 20F a4/8 nA gh tle In} Whol ae 20 40 60 30 100) 120 Figure 2.8. Annual runoff series of the Danube River at Orsova, Romania, for 120 years (1937-1957) in modular coefficients, @,/Q with @, = the annual values, and Q = the annual mean. The dependence characteristics of annual time series are basically investigated and presented by using two classical statistical computations arid relations: (1) the correlogram which is a representation in the time domain, and (2) the spectrum which is a representation in the frequency domain. For an independent series the population correlogram is equal to zero for k # 0. However, samples of independent time series, due to sampling variability, have r, fluctuating around zero but they are not necessarily equal to zero. In such case, it is useful to determine the probability limits for the correlogram of an independent series. Anderson (1941) gave the limits “14 1.96 JN-K-1 195%) (2.21a) and 1,(99%) = (2,21b) for the 95 percent and 99 percent probability levels respectively and N. is the sample size. Figure 2.9 shows correlograms r, of Eq. (2.5b) of annual runoff for four large European rivers, with probability 49— sta aver a NeMUNts RIVER —- aNUge ALVER RHINE RIVER Le989%) for w=150 L095 9%) tor Figure 2.9. Correlograms of annual runoff series of four European rivers: the Géta River, Sweden (N = 150), the Nemunas River, Lithuania (N 132), the Rhine River at Basle, Switzerland (N = 150), and the Danube River at Orsova, Romania (N = 120). Probability limits at the 95 percent level are given for the normal independent variables for two lengths; N = 150 (max) and N = 120 (min), (Yevjevich, 1964). limits computed by Eq. (2.2la) for two cases of N = 150, and N = 120. The r, values for the Rhine and Danube Rivers are greater than zero, but within the probability limits. For the Géta and Nemunas Rivers the ri coefficients are outside these limits. The other r, coefficients have less than 5 percent of computed values‘ outside the limit: Figure 2.10 shows the correlogram of annual runoff series of the St. Lawrence River, the correlogram of annual effective precipitation series on the basin, the fitted correlogram of the first order autoregressive model to runoff correlogram (see Eq. 4.14) and the limits for the independent series determined by Eq. (2.21a), The correlogram shows a highly dependent annual runoff series and an independent annual effective precipitation series Figure 2.10, Correlograms of the St. Lawrence River at Ogdensburg, New York: (1) 1, of annual runoff series; (2) ry of annual effective precipitation series; (3) correlogram of the first order autoregressive model; and (4) probability limits at the 95 percent level for normal independent variables with N = 97. 50The spectrum of annual time series may be determined by transforming the correlogram r, as (Yevjevich, 1972) m g(f) = 214+2 2 Dy ry, cos 2ntk) (2.22) K=1 where g(f) is the smoothed (weighted) sample spectral density, f is the ordinary frequency (equal to 1/2m), k is the lag; m is the maximum number of k's used (often N/6 to N/4), and D, is a smoothing function. For instance, Parzen (1967) gives p, = 1- 68) + 6B)? , tor ke ¢ m/2 =2- ES , form/2 m For an independent annual series, g(f) of Eq. (2.22) should not be statistically different from 2.00 for the range of f= 0.0 - 0.50. Figure 2.11 gives the average spectrum of annual precipitation series of 231 gaging stations in an area of the United States, for the lengths of series varying between 35 to 150 years. It shows that the annual precipitation series are, for all practical purposes, the independent (and therefore) temporarily stationary time processes. Figure 2.11. Average variance spectrum of 231 annual (homogeneous) precipitation series of the northwest area of the United States (area between longitudes 94° and 85°, and latitude 36.5°° and Canadian-USA border). Figure 2.12 shows the estimated spectral densities g(f), line (1), and a fitted spectrum, line (2), of the second order autoregressive model for the annual flow series of the Géta River for N = 150 years. The resulting residuals of spectral 51densities, line (3), are added to the expected value 2 of an independent series. s.of g(F) 40] ° ° or 02 0304 8 Figure 2.12, Spectra of annual flow series of the Géta River at Vanersburg, Sweden, for 150 years: (1) estimated spectrum, (2) fitted spectrum of the second order autoregressive model, (3) spectrum of residuals, and (4) expected spectrum of the independent series Summarizing, the analysis of historical records of annual hydrologic series lead to the following conclusions 1. Processes of annual _ precipitation, annual evaporation, annual effective precipitation on river basins (precipitation minus evaporation}, annual runoff from river basins, and similar hydrologic processes, may be considered in most cases as approximately temporary stationary stochastic processes provided the systematic errors in observed data Gnnconsistency) and the human-induced changes and natural accidental disruptions (nonhomogeneity in data) are properly taken into account 2. The major time dependence in hydrologic annual series is produced by the complex geophysical processes of water storage in river basins, with their random fluctuations from year-to-year and periodic and stochastic fluctuations within the year. 523. The longer a series the greater is the probability of some nonhomogeneity being present in data, produced either by human activities or by accidental disruption in nature, plus some systematic errors (inconsistency). 4. There exists some observed hydrologic series exhibiting changes in the statistical characteristics which do not appear to be produced by nonhomogeneities or inconsis- vencies. However, further investigations are necessary in order to substantiate claims that such changes are produced by some localized or regional climatic changes 2.4 CHARACTERISTICS OF PERIODIC TIME SERIES The statistical characteristics of periodic series are discussed assuming that any inconsistency and nonhomogeneity were first identified and removed from the original series Periodic hydrologic time series such as seasonal, monthly, weekly and daily series have similar and different statistical characteristics as annual series. The basic difference is that periodic series, in most’ cases known in nature, have significant periodic behavior in the mean, standard deviation’ and skewness. In addition to these periodicities, they show a time correlation structure which may be either constant or periodic. Consequently, such time dependence may be represented by say, autoregressive models with constant coefficients as in the case of annual series or with models with periodic coefficients. The periodic statistical characteristics can be determined by Eqs. (2.7) through (2.10) The plot of the periodic series versus time gives a good indication of its main statistical characteristics. For example, the daily flows of the Boise River near Twin Springs, Idaho for the year 1921 as given in Fig. 2.13 shows a typical behavior in which during some days of the year the flows are low and in other days the flows are high. During the days of low flows the variability is small, on the other hand, during high flows the variability is large. This characteristic behavior of daily flows indicates that the daily mean and daily standard deviation vary periodically throughout the years (see Fig. 2.1 and 2.2). Figure 2.14 also shows the periodic behavior of the monthly flows of the Middle Fork of the American River near Auburn, California. The skewness coefficient is another characteristic which may vary periodically. See for instance Fig. 2.3 or the daily skewness coefficient for the Boise River daily flows. One way to remove the periodic skewness is by logarithmic transformations of the original series (see Sec. 3.2). The time dependence characteristics of periodic series may be studied determining the correlogram 1, of Eqs. 5314,000 11,200 g ae 4 2,200 me o 73 146 2 292 365 bays Figure 2/13. Daily flows of the Boise River, Idaho, USA, for the year 1921. 3 Discharge, ef ser Se TE THe Tas ge Diacharge, ef 8 8 AN ate TSS TOO TIS 82 TNT Te 19S TS “Tenuaey Figure 2.14. Monthly river flows for Station 11B.402, Middle Fork of the American River near Auburn, California for the period 1931-1960 (Roesner and Yevjevich, 1966) pte ashy TSP TSS TAT Tab (2.5a) or (2.5b) or the correlogram Tet of Eq. (2.10) of the original time series Xoo In most cases though, such correlograms are determined’ after removing the periodic mean and periodic standard deviation. In that case the new standardized series z, is obtained from vot 54(2.24) where ands, are the periodic mean and periodic standard deviation, respectively. _ Figure 2.25 shows an example of the correlograms r, obtained from Eq. (2.5b) for (a) the logarithims of th original series x, and (b) the series z, of Eq Vat vst (2.24) after using the log transformations, for the monthly flows of the Middle Fork of the America River near Auburn, California. The reason why the correlogram of the case (a) shows a periodic variation is because r, was computed before removing the periodic mean and feriodic standard deviation from the original series. On the other hand, the correlogram in the case (b) does not show a periodic variation Correlgiam of Loportims of Monthy Row Series x Correlegram of Standordied Series Z oo 36 ES Se ea (b) Figure 2.15. Correlograms of (a) the logarithms of the original monthly river flows; (b) the series on the log-domain after removing the per- jodic mean and periodic standard deviation, for the Station 11B.402, Middle Fork of the American River near the Auburn, California (Roesner and Yevjevich, 1966). 5abecause the basic periodic characteristics were first removed from the original series before computing r,. Figure 2.4 shows the correlation r, computed from Eq. (2.10) for the daily flows of the Boise River. It shows that during the days of the flows of relatively high variability the correlations are lower and more variable than during the days of the recession. As another example, the correlation r, , of Eq (2.10) was computed for the series 2, | of the Tioga River near Ervins, New York. Figure 2.16 shows the computed ry, for the logarithmic transformed daily, 3-day, 7-day and 13-day flow series. The existence of a periodic behavior of the computed r, | for all indicated time intervals can be observed, although for daily series it shows high fluctuations. 1.0 +90 -80 70 30 Figure 2.16. Periodic correlograms of daily, 3-day, 7-day and 13-day series for the log-transformed flows of the Tioga River: (1) ry , computed from Eq. (2.10), (2) smoothed’ ry , using Fourier (trigonometric) series, and (3) the mean of r, _ (Tao, et al., 1976). 562.5 CHARACTERISTICS OF MULTIVARIATE TIME SERIES A variable observed at several points along a line, over an area or across a space, represent a multiple time series or generally speaking a multivariate series. Bach time series can be statistically analyzed separately. However, it is often the case that the stochastic components of these series are mutually dependent random variables (dependent along the line, over an area or across a space). In other words, the stochastic components represent a set of n time series dependent among themselves. When the objective is to generate new sample of time series at a set of points, the basic re- quirement is not only to preserve the statistical characteristics at each of the n series, but also to preserve the mutual dependence among these n time series. The dependence structure among n_ time series can be determined by computing the lag-k cross-correlation between the series. For instance, considering the series x,“ and ae the lag-k cross-correlation coefficient 1," is given by Nek 4G 5 G) _ 3G). GQ). 3G) ieog KE) Oe ey WEG) wa FE @ pay 6 +5 (xG) - 3@ fer tek © * te where is the mean of the first N-k values of series i, and is the mean of the last N-k values of series The sample cross-correlation coefficient ry of Eq. (2.25) is the open series estimate of the population cross-correlation coefficient Pp For n_ time series it is common to represent the correlation structure by the matrix 112 in jh Oh ee TE 21.22 2n a alee " My, = . . . (2.26) pol ne pon ne x J where the rs are computed by Eq. (2.25) 57When dealing with periodic series the periodic dependence structure between two series x and xP) are determined by > » 1 © @ 2 e@) G@_ _ 3@ i Ne eas a vite 7 ¥t-12 ‘eo eG Tv tk where # ana ¢{ are the ai t time inter- a‘ GD periodic means at time inter vals t and tk, respectively, and s() and si) are the periodic standard deviations at time intervals t and t-k, respectively. When t-k <1 in Eq. (2.27), N is replaced Q@ and x(j) ve Lwtt-k wk Te by N-1, xD is replaced by x a xD placed by x). Figure 2.17 presents the location of 79 precipitation stations’ in the Upper Great Plains of the United States = 5 \ Minnesota 3a \ ol awe as |wiseansia Figure 2.17. The study area and location of the 79 stations of precipitation in the Upper Great Plains of the USA (after Tase, 1976). 58,Figure 2.18 shows for the Station No. 52 of Fig. 2.17 how the Jag zero cross-correlation coefficients (ro of Eq. 2.25) between the monthly precipitation series in the area vary with Jatitude and longitude. This coefficient decreases with an increase of the distance between the correlated stations, so a relationship of r to the distance d only may be sufficient to describe the regional dependence characteristic. The cross- correlation coefficients as a function of d is given in Fig. 2.19. The fitted function of r+, to the distance d is computed as ° Ty = exp {-0.00418 d} (2,28) Figure 2.18, Isocorrelation patterns for the series of Station 52, as correlated with all the other stations series, for the residual independent series of monthly precipitation in the Upper Great Plains of the USA (after Tase, 1976). In general, the characteristics of multivariate time series are often presented in the form of regional information. For instance, the statistical characteristics such as the mean and standard deviation may be expressed as a function of the latitude, longitude and altitude of the gaging stations over the study area. The function describing the regional characteristics of a multivariate series may be written as 2 2 + b)X + byY + bgX” + byY” + DEXY (2.29) 59me soa Figure 2.19. Lag-zero cross correlation coefficient ro versus the interstation distance d, and the fitted function r = exp (-0.00418d) for the stochastic component of monthly precipitation in the Upper Great Plains of the USA (after Tase, 1976) where v is any regional statistical characteristic, X is the longitude and Y is the latitude of gaging station position, and be, bi, bz, ..., are the parameters of the regional equation. Thus relations as Eq. (2.29) enable the estimation of statistical characteristics at any point of a grid of points For instance, for the Upper Great Plains of the United States referred in Fig. 2.17, the isolines of the general monthly mean are shown in Fig. 2.20. Similar isolines can be obtained for other statistical characteristics needed in studying the multivariate time series. For each the characteristic equations similar to Eq. (2.29) may be used. 2.6 CHARACTERISTICS OF INTERMITTE TIME SERIES Intermittent hydrologic time series are those series that have zero or constant values for some intervals or continuous times, and non-zero values (usually positive) or non-constant values for the remaining intervals or continuous times. Ex- amples of intermittent series are: (a) short interval (hour, day, week) of precipitation, (b) small rivers that have periods of no flow, (c) sediment transport only during flows greater than a critical discharge, and (d) reservoirs for which the full and empty states may be conceived as intermittencies 60Figure 2.20. Isolines of the 30-year general monthly mean for the precipitation of the Upper Great Plains of the USA (after Tase, 1976). When the intermittency is added to the periodic and stochastic components of a hydrologic time series, it becomes very complex and further characteristics are needed for its understanding and description. The new information is for the off-on process (or zero and non-zero values), with any patterns in their mutual successions. Two basic approaches are used in describing characteristics of these series: (1) intermittent series are conceived as off-on additional process to all the other process characteristics of periodic time series, and (2) intermittent series are conceived only as truncated series of otherwise non-intermittent general time series Evidently when periodicity, stochasticity and intermittency are considered for a multivariate series, the complexity is so compounded that only a most advanced analysis of the series would enable a reliable statistical description and modeling of such series. REFERENCES Anderson, R. L., 1941. Distribution of the serial correlation coefficients. Annals of Math. Statistics, Vol. 8, No. 1, pp. 1-13, March. 61Jenkins, G. M. and Watts, D. G., 1969. Spectral Analysis and its Applications. Holden-Day Series in Time Series Analysis, San Francisco, California Kendal, M. G. and Stuart, H., 1968 The Advanced Theory of Statistics, Vol. 3, Design and Analysis and Time Series. 2nd edition, Hafner, New York Parzen, Emanual, 1967. Time Series Analysis Papers. Holden-Day, Inc., San Francisco, California Roesner, L. A., and Yevjevich, V., 1966. Mathematical models for time series of monthly precipitation and monthly runoff. Hydrology Paper 15, Colorado State University, Fort Collins, Colorado Tao, P. C., Yevjevieh, V., Kottegoda, N., 1976. Distribu- tion of hydrologic independent stochastic components. Hydrology Paper 82, Colorado State University, Fort Collins, Colorado. Tase, Noreio, 1976. Area-deficit-intensity characteristics of droughts. Hydrology Paper 87, Colorado State Univer- sity, Fort Collins, Colorado. Yevjevich, Vujica, 1964. Fluctuations of wet and dry years, Part II, Analysis by serial correlation. Hydrology Paper 4, Colorado State University, Fort Collins, Colorado. Yevjevich, Vujica, 1972. Stochastic Processes in Hydrology. Water Resources Publications, Fort Collins, Colorado Yevjevich, Vujica, 1975. Generation of hydrologic samples, Case study of the Great Lakes. Hydrology Paper 72, Colorado State University, Fort Collins, Colorado 62Chapter 3 STATISTICAL PRINCIPLES AND TECHNIQUES FOR TIME SERIES MODELING The main objective of this chapter is to present the basic principles and techniques necessary for the stochastic modeling of hydrologic time series. The first section discusses the most common statistical techniques for estimating model parameters including the estimation by regionalization Section 3.2 gives various procedures for transforming skewed variables into normal variables. Section 3.3 deals with the estimation of periodic parameters such as the periodic mean, periodic standard deviation and periodic correlations by using Fourier series fitting techniques. Section 3.4 discusses the solution of a matrix equation necessary for estimating the parameters of multivariate models. The techniques for testing the goodness of fit of stochastic models are given in Sec. 3.5 and the topic of preservation of historical statistics and parsimony of parameters is discussed in Sec. 3.6. Finally, Sec. 3.7 deals with the synthetic generation of new series and the use of models for forecasting. The principles and techniques given in this chapter are necessary for the modeling of univariate and multivariate series presented in subse- quent chapters 3.1 BASIC ESTIMATION TECHNIQUES Methods derived from mathematical statistics for estimating the parameters of models representing random variables are called estimation techniques. Consider, we have a sample time series x,, ..., Xy and @ model with parameters a and § representing such series. The estimates from the sample of such parameters are denoted @ and f and are called the estimates or estimators of a and p. The most common estimation techniques are the method of moments, the method of least squares and the method of maximum likelihood. Depending upon the estimation technique, some estimators are better than others. The criteria for judging the goodness of estimators are first discussed below. Subse- quently, the three estimation techniques suggested above are discussed in some detail 3.1.1 PROPERTIES OF ESTIMATORS Two properties are commonly used in statistics for judging the goodness of estimators. They are bias and mean square error. If the expected value E (G) of the estimator 4 is equal to the population parameter a, then @ is said to be an unbiased estimator of a. Otherwise, @ is a biased 63estimator of a with a bias equal to E (@) - a. The interpretation of this property is as follows, Suppose that based on the model of parameters a and , we generate m synthetic sequences of the same length N and from each sequence the estimators @, i= 1, ..., m are computed. If the mean of the estimators’ @, is statistically the same as the population parameter a, then 4 is an unbiased estimator of a The difference (a-d) between the parameter a and its estimator @ is the error of the estimate. The expected value of the square of such error is called the mean square error (MSE) of the estimator @. It can be written as E{(a-a)*] Var (@) + (a-E(@)]? 3.1) Equation (3.1) shows that the MSE is equal to the variance of the estimator plus the square of the bias. Wherever @ is unbiased, a = E(@), so in such case the MSE is equal to the variance. Obviously, we would like to have an estimator as close to the parameter as possible. That is, we would like to have an estimator with a small MSE or if possible an estimator with the minimum MSE. If @ is unbiased and it has a minimum variance (i.e., unbiased and minimum MSE), then @ is an efficient estimator. When selecting an estimator or a method of estimation, we should consider both properties bias and MSE. We would like to have both desirable properties: an unbiased estimator and a minimum MSE estimator, In some cases an estimator may be unbiased but it may not be minimum MSE estimator In other cases it may be the opposite. Furthermore, estimators often are biased and do not have a minimum MSE Therefore, when selecting among alternative estimators, a criteria is to select the estimator with the smallest bias and the smallest MSE. When this is not possible, the analyst must judge which of the two properties is more desirable for a particular case and select the estimator accordingly In addition to the properties of bias and mean square error discussed above, the properties of consistency and sufficiency are important for describing estimators. Assume that @, is an estimator of the parameter a determined from a sample of size N. If & +a as N increases, then N Gy is a consistent estimator of a. Finally, if the estimators make maximum use of the information contained in the data they are said to be sufficient (Benjamin and Cornell, 1970). 643.1.2 METHOD OF MOMENTS The expected value E[X] of a random variable X is called the first population moment of X. In general, the expected value E[X") is called the r'® population moment of X. Similarly, when dealing with a sample x,, x2, xy, the r"P sample moment is defined by m, = (I/N) 3 x}. If the random variable X represents a time series given by a model with parameters a, ..., a, the population moments are functions of those parameters.P Therefore, the moment parameter estimates aré obtained by equating population moments and sample moments, and solving for the parameters. If p is the number of parameters to estimate, then the first p. population and sample moments must be equated and solved simultaneously . For instance, consider the time series model X, = atbZ, where a and b are the parameters and Z, is an independent random variable with mean zero and variance one. We would like to estimate the parameters a and b based on a sample series X,, -.. Xy. Since we have two parameters, then the first and the second population and sample moments must be equated. The first two population moments of the model are . E(x] =a (3.2) and E [x7] = a +b? (3.3) Similarly, the first two sample moments of the series are 1 . me 2X (3.4) isl and N 2 mth 2X: (3.5) il Equating the first population and sample moments of Eqs. (3.2) and (3.4) respectively, we obtain the estimate 4 of the parameter a as (3.6)Equating the second population and sample moments of Eqs (3.3) and (3.5) respectively, we obtain the estimate 6 of the parameter b as vena s z cs 1 Ne 2 BE xg. #] .D i=l where the symbol * is used to give the meaning of estimate to the corresponding parameter. Note in the above example that the parameters a and b of the model are actually the mean and standard deviation of X. Therefore, their estimates given by Eqs. (3.6) and (3.7) are the sample mean and the sample standard deviation respectively The estimation of parameters by the method of moments is usually not difficult to obtain and it is simpler than the estimation by the other methods Often the moment estimates are used as first approximations for the estimation by other methods. Except for the estimate of the mean, the moment estimates of other parameters are usually biased, although adjustments can be applied to make them unbiased. Moment estimates are asymptotically efficient when the underlying distribution is normal For skewed variables though, the moment estimators generally are not asymptotically efficient. 3.1.8 METHOD OF LEAST SQUARES Consider that the model of a sample time series yj, Yu Ye = LOE ys Mpegs ey Gyr vray Gy) +e where ay, : %» are the parameters and ¢, is the residual or error t series wich has zero mean. The least squares estimation method is based on finding the estimates @, ..., 4, so that the sum of the squared differences between the observed values y), ..-, Yy and the estimated expected values vy = fps 1 Gy, oy Gy), tel, ..., Ny respectively is mini- mized. That is, @), ..., 6, should be chosen to minimize NUN par . aoe, J ef = a Gypdp" = [yf » Gy, ---s G9] (3.8) To find the minimum of the sum (3.8) all partial derivatives .of the sum with respect to @ , a. must be equal to zero. That is LY 66=0 (3.9) These p equations with p unknowns must be solved simultaneously to obtain the parameters’ estimates @, 4): These estimates are efficient when certain conditions are satisfied (Yevjevich, 1972a) Consider the time series model y, = $1Y,_, + © where 38 also zero and the variance o2 of «, is assumed known. We would like to determine the least squares estimate of the parameter 61, based on the sample series yi, yz, ..., Yy- The sum of the squares of the errors as in Eq. (3.8) is the mean of y, is zero, the mean of « 2s ay. .? Def = yy, - by) and the partial derivative of the sum respect to $, is 22, = by =F Ay, - > Cy 28, ye = BD OVE Equating it to zero and solving for a, we have bey (3.10) Note that in the above expressions the summation varies from t=2 through t=N 3.1.4 METHOD OF MAXIMUM LIKELIHOOD Consider the time series model yy = f(¥4_1) Yp-g» , Oy, ++) Gp) + ey is to be fitted to a sample series yy, ..., Vy where a= lo, ...,0,} is the parameter set of the model and ¢, is the error term. The joint probability of is called the likelihood function L(.) and it is Creer expressed Ns N LC.) = f(e,5 a).fle9; a)... fleys a) = a) fey; a). (3.11) The maximum likelihood estimate of the parameter set o is obtained when the function L(.) of Eq. (3.11) is a 67maximum. The same estimates are obtained if the log-likelihood function is maximized instead of the likelihood function. In such case the log-likelihood function is written as N N In Te f(e,3 a) z ts t= LL¢.) In f(e,3 @ . (3.12) 1 The partial derivatives of LL(.) with respect to the parameters a), ..., a, equated to zero are aLL¢ aLL(.) 3 30: > Se (3.13) 1 P Then the solution of Eqs. (3.13) yields the maximum likelihood estimates 4), > a, The maximum likelihood estimators are asymptotically efficient. They are also consistent estimators (if consistent estimators exist) as well as sufficient estimators. As in the previous method, consider the time series model Y_= Ye ¢ & Where e, i8 assumed normal with mean zero and variance o2. We wish to determine the maximum likelihood estimate of the parameter 6, given that we have a sample series yi, ..., Yy- The likelihood function L(.) of Eq. (3.11) can be written‘as s exp {gle + 2 8 (3.14) é Note that with the model y, = 6,y;.; +, and the sample series Yj, -.., ¥y only &, -.., &y can be determined since €, would require the knowledge of y,. Therefore if €, is not included in Eq. (3.14), the true likelihood function is not obtained but only an approximation. There are some ways how ¢, can be predicted and a better approximation of the likelihood function can be obtained. For more details on how to predict the starting values of a time series (as in the case of ¢; of the above example) the interested reader is referred to specific books on time series analysis, for instance the book of Box and Jenkins (1970) For our example let us assume that ¢1 is predicted or it is avoided in Eq. (3.14). Then the log-likelihood function of Eq. (3.12) becomes Me 2 € t =-N _a LL(.) = -N In (2x 9,) Yo t1 68N LL(.) = - N In (21 0,) - * a > ony,” (3.15) ete Taking the partial derivative of LL(.) with respect to 1, equating it to zero as in Eq. (3.13) and solving for 6, we have (3.16) Observe that if ¢, is not predicted the summations of Eq. (3.16) would not include y, in which case the approximate maximum likelihood estimate of $1 of Eq. (3.16) would be the same as its least squares estimate of Eq. (3.10). Gen- erally, if © is predicted, then § of Eq. (3.16) would be a better estimate of $, of Eq. (3.10). 3.1.5 JOINT ESTIMATION OF PARAMETERS Whether the method of moments, least squares or maximum likelihood is used for estimating the parameters of a model, the parameters should be jointly estimated. The reason is because the sample estimates are mutually dependent variables. When the model involves a small number of parameters, as in the case of models of annual time series, the joint ‘estimation is simple to obtain. However, for models of periodic time series usually involving a large number of parameters, the joint estimation becomes more complex generally requiring the solution of large number of non-linear equations. In order to simplify or avoid complex estimations as mentioned above, 2 sequential procedure for estimating a group of parameters at a time is usually followed. For ex~ ample, when dealing with non-normal periodic time series the estimation of parameters of the normal transformation function are first determined. Once the transformation is made, the periodic parameters in the mean and in the standard deviation are estimated and removed from the time series. Then, the dependence structure of the residual series is analyzed and the corresponding model parameters are estimated. How different are the parameters of the above sequential estimation procedure with the parameters estimated jointly is difficult to assess and it seems that such comparison has not been made up to present. 693.1.6 PARAMETER ESTIMATION BY REGIONALIZATION Regionalization is used in time series modeling mainly for (1) determining the model parameters at ungaged sites, and (2) improving the estimates of model parameters at sites with short records. In the first case, the model parameters are determined at sites with records and they are regionalized in mathematical form as in Eq. (2.21) or in graphical form by regional parameter isolines as in Fig. 2.20, In the second case, the improvement of parameter estimates can be made by transferring information from the stations with long records to the station with short records. Under certain correlation conditions, the transfer of information can improve the accur- acy of parameter estimates. Improvement in modeling at the station with short record can also be made by adopting the type of model identified and estimated at other stations in the region with longer records 3.2. NORMALIZATION OF TIME SERIES VARIABLES Most probability theory and statistical techniques applied to hydrology in general and to hydrologic time series analysis in particular, are developed assuming the variables are normally distributed (Gaussian). Because most frequency curves of hydrologic variables are asymetrically distributed, or are bound by zero (they are positively valued variables), it is often necessary to transform those variables to normal before carrying out the statistical analysis of interest. The first part of this section treats the normalization of annual series and the second part, the normalization of periodic time series Some further discussion on this subject is also made in Chapters 4 and 7 3.2.1 NORMALIZATION OF ANNUAL TIME SERIES For a large number of annual series of precipitation and runoff series in the United States, several authors have found (see Markovic, 1965) that the two-parameter lognormal (lognormal-2) probability distribution function fits their frequency distribution well. Assuming that the annual time series x, is represented by a lognormal-2 distribution, its probability density function is 7 1 1 y fC) =yaray exp tg ! (3.17) where yy and oy are the parameters of the function. They have the subscript y because they represent the mean and the standard deviation of y = log (x) respectively. Therefore, the transformation of x by 70y = log (x) (3.18) yields a normal series y, with mean . and standard deviation o,. This also medhs that by log (x) - H ye (3.19) 7 is a standard normal series with mean zero and standard deviation one If however, the two-parameter gamma (gamma-2) probability distribution function is used for the annual time series x,, namely 1 xtnd eo X/B f(x) = 0 ere (3.20) where a and ® are the parameters and [(.) represents the gamma function operator, then the transformation yak (3.21) produces a relatively symmetrical (though not exactly normal) variable. The above transformations of Eqs. (3.18) and (3.21) could include a lower bound c. For instance, if x is replaced by x-c in Eqs. (3.17) through (3.20), then x will be distributed as lognormal or gamma with three parameters. In some applications the lower bound is obtained by experience or by trial although analytical estimation procedures are available (Yevjevich, 1972a, p. 140, 148) An extension of the power transformation of Eq. (3.21) can be used as = a(x-c)? (3.22) where c is the lower boundary parameter and a and b are the other parameters. Values of b are usually in the order of 1/2, 1/3 and 1/4. More complex transformations are also available. The reader is referred to books in statistics, probability theory or time series analysis such as the book by Box and Tiao (1973, p. 530). 3.2.2. NORMALIZATION OF PERIODIC TIME SERIES. The problem often posed in the normalization of periodic time series is at what point in the analysis the transformation should be made. For instance, the transformation could be made directly on the original series x, , (before the per- jodic means 1, and the periodic standard deviation o, are 71estimated and removed from the series) or the transformation could be made after the periodic parameters 4, and o, are estimated and removed from the series x, .. This problem has not been properly resolved yet for practical modeling and parameter estimation of periodic hydrologic time series. Also reliable inference techniques are needed for testing whether the transformed variable satisfies the assumption of being close to a normal probability distribution. If the normalization is carried out directly on x, _, the periodicities in the mean and standard deviation of the transformed series y, | will be highly distorted in comparison with those of the original series x, On the other hand, if the normalization is VT" carried out in terms of the standardized series - Ayr Yvyu & where fi, and 6, are estimates of , and o, , respectively, then the y, , series will have both positive and negative values and generally it would require more complex transformations Transformation of the original asymmetric periodic time series x, , into a normally distributed time series y, | may be accomplished by using various functions. A general transformation function has the form b, ey 7 7 ey) * (3.23) vit where c, is the periodic lower boundary parameter, and a, and B, are the other periodic parameters. Often log transformations as in Eq. (3.18) with no parameters or Your = log (xy y= ey) (3.24) with one-parameter are used. In the above transformations of Eqs. (3.23) and (3.24) it is indicated that the parameters vary with t where t=1, ..., w and w is the number of timé intervals in the year. However, it may not be necessary to have all parameters varying with 't. In some cases, especially when w is large say w=52 for weekly series, w=365 for daily series or even w=12 for monthly series, the parameters, although varying with time, can remain constant for several values of 1. For instance, for uFl2 it may not be necessary to have twelve values of c_ but only four values, one for each season r 72In the normal transformations of periodic time series the same type of transformation is usually applied for all time intervals 1 within the year. However, the type of transformation function may also vary with time because the mixed distribution of say daily variables to produce weekly, monthly or other seasonal variables, does not need to be composed of the same type of probability distribution function. Atmo- spheric and river basin processes and responses during dry periods do not have to produce the same type of probability distribution function as during wet periods. Therefore, in some cases, the assumption of the same distribution as in Eqs. (3.17) or (3.20) or the same transformation function as in Eq. (3.23) or (3.24) for all values of t may not be realistic and may lead to some distortions in modeling 3.2.3 REMARKS Generally, three main approaches have been proposed for dealing with skewed hydrologic time series: (1) to transform the skewed series into normal before modeling the series; (2) to model the original skewed series and take care about the skewness by finding the probability distribution of the un- correlated residuals; and (3) to find a relationship between the first two moments of the original skewed series and those of the normal series so that the moments of the original skewed series are preserved. Each of the above approaches have some pros and cons that must be considered for their practical application. The major advantage of using the first approach lies in the fact that the best techniques in statistics and stochastic processes are developed for the normal processes. So it is simpler to transform the skewed variables into normal (or at least close to normal) rather than finding similar procedures for non-normal variables, or trying to assess the errors of applying the methods developed for normal variables to skewed variables, especially to those variables of small time intervals such as hourly, daily or even weekly series which usually are highly skewed. On the other hand, when transforming the original series into normal biases in the statistical properties (such as the mean and standard deviation) of the generated series may occur. In other words, the mean of the transformed series may be reproduced in the generation but not the mean of the original series (before transformation). If biases are small it would still be advisable to use the first approach of transformations to normal, but if biases are large then the second or third approach may be useful. When the modeling is made directly on the original skewed time series (second approach), the estimation and testing of the model is not as efficient as if the series were normal. Furthermore, since time series models are usually linear such as AR and ARMA models then up to a certain extent, due to the Central Limit Theorem of 73probability theory, the skewness of the generated dependent variable will be smaller than the skewness of the original dependent series. However, this may not be that critical since estimates of the skewness for the typical lengths of hydrologic records are very unreliable, so some small degree of bias in the skewness may be acceptable. An alternative for avoiding biases in the mean and standard deviation would be the third approach or using a relationship between the moments of the skewed and normal series. Matalas (1967) and Fiering and Jackson (1971) describe how to estimate the parameters of the log-transformed series so as to reproduce the parameters of the original series. Furthermore, Mejia and Rodriguez-Iturbe (1974) presented another approach in order to avoid biases in the correlation structure of the generated series In conclusion, several methods of transformations of skewed time series into normal series have been presented which are applicable for annual and periodic time series Some of the advantages and limitations of such transformations for the reproduction of the main statistical properties of the historical series were discussed as well as two other alternative procedures for approaching the modeling of skewed time series. The modeling procedures described in Chapter 4 through 8 of this text are essentially based on transforming the series into normal when they have significant skewness Alternatively suggestions are made to use the second approach (modeling the original skewed series) when actually needed Further discussion and comments about those two approaches, as indicated above, are made especially in Chapters 4 and 7. We do not use the third approach (relation of moments of skewed and normal series) in the remaining text We feel that in most practical cases of hydrologic time series analysis the first two approaches can solve the problem of skewness. For some special cases the interested reader may wish to consult the references given above for more details on the third approach. 3.3 ESTIMATION OF PERIODIC PARAMETERS BY FOURIER SERIES Let us assume that represents a periodic y, vit hydrologic time ‘series where v is the year and ¢ is the time interval within the year. The mathematical model for Vy._ can be generally written as Ete 2, (3.25) where }, and o, are the periodic mean and periodic standard deviation, respectively and 2, _ is usually a time 74dependent series with either constant or periodic correlation structure. For example yr Sy Bee * Pur (8.28) is an AR(1) model with periodic autoregression coefficient ry Higher order AR models as well as ARMA models may ald have periodic parameters. It may be shown that these parameters are functions of the lag-k periodic correlation coefficients p, , of the series z, |. For instance, in the case of the AR(1) model of Eq. (3.26) 6, |=, , where py _ i8 the periodic first correlation coefficient of 2, (see Section 4.3.2). Since the periodic population parameters p,, 0, and Py,z are not known, they must be estimated from historical data. Two procedures can be followed for obtaining those estimates: (1) the non-parametric approach, that is, the mean, standard deviation and correlation coefficients are determined directly from Eq. (2.7), (2.8) and (2.10), respectively or (2) by Fourier series fit of the estimates referred in (1)... The purpose of this section is to describe the Fourier series analysis for estimating periodic parameters such as 4, o, and p,,. The first part of this section gives some reasons why the Fourier series analysis may be useful in practice. The second part describes the main equations used for Fourier series analysis and the third part deals with the criteria for selecting the significant terms of the Fourier series equations. 3.3.1 JUSTIFICATION OF USING FOURIER SERIES During the past years much controversy has been raised among hydrologists about whether to use or not to use Fourier series analysis for estimating the periodic parameters of periodic stochastic models. Therefore, it is fair to put forth some arguments and criteira about why and in what cases the use of Fourier series is justified Consider the case of estimating the periodic mean 4 One estimate of 1, is obtained from Eq. (2.7) as ti, »w (3.27)where w is the total number of time intervals within the year, Then, the differences between the estimate y, of Eq. (3.27) and the population parameter HL are Tl, es (3.28) Because the number of years N is usually small for most hydrologic time series, the individual sampling errors e, of Eq. (3.28) are often large. Besides, if w is large, sdy 52 for weekly series or 365 for daily series, all 52 or 365 values of y, cannot be estimated accurately, and the use of too many ‘parameter estimates violates the principle of statistical parsimony in the number of parameters (see Sec. 3.6) The experience and the physical analysis of responses of hydrologic environments to periodic stochastic inputs shows that mean is a relatively smooth function. It is sufficient to estimate 7, of a series for samples of different sizes and to find that the smoothness of the sequence j, increases as the sample size increases. Therefore, it is expected that in general, by using the Fourier series fit of y,, the resulting estimated periodic function f_ will be smoother than ¥, and will have less sampling erfors 1, 24, w (3.29) than the corresponding errors of Eq. (3.28). In the same manner it may be argued that @., the Fourier series fit of the standard deviation s, of Eq. (2.8), is a smooth function so that the errors o| - 6, are smaller on the average than the errors a. - s_. “Simildr arguments can be given for using the Fourier geries‘ fit of any.other periodic parameter of the series y, ,. Although the use of Fourier series fit of periodic parameters may be often convenient as indicated above, in some cases it may lead to serious distortions of the dependence structure of the residual series 2, . of Eq. (3.25) (Rodriguez -Iturbe et al., 1971), This is especially true when a good fit of the periodic parameters during the dry season is difficult to find. For instance, in an extreme case, the fitted periodic mean or periodic standard deviation may even hecome negative. In such cases the use of Fourier series should be avoided. In general, Fourier series analysis for estimating periodic parameters of hydrologic series should be used with judgment. Special care should be taken espe~ cially by those with no experience with such technique However, as familiarity and experience with it are gained, the 76user will realize its practical advantages as well as its limitations Just as Fourier series analysis can be used for smoothing out the sampling variations of parameters in time, it can be applied for smoothing out sampling variations in space. Karplus and Yevjevich (1973) used it for estimating monthly precipitation parameters in the Central Plains region of the United States. Salas (1975) applied Fourier analysis for estimating regional periodic parameters of weekly streamflow in the Central Part of the Peruvian Andes and Woolhiser and Pegram (1979) used it for estimating daily precipitation parameters in various regions of the United States. In using Fourier series analysis for multiple time series, checks should be made for obtaining consistent dependence structure in space in addition to verifying the time dependence structure As stated above, care and judgment should be used for the best use of Fourier series in hydrologic regional analysis 3.3.2. ESTIMATION OF FOURIER SERIES COEFFICIENTS Let us consider that u, represents a periodic statistical characteristic of the hylrologic series y, |, such as the periodic mean y,, the standard deviation s,”"lnd the correlation coefficients ry, of Eqs. (2.7), (2.8) and (2.10), respectively. Assume also that u_ is a (nonparam- etric) ‘sample estimate of the unknown population periodic parameter denoted by v.. The parametric or Fourier series representation of u,, denoted in general as @_, can be obtained by (Yevjevich, 1972b) . h @ = 0+ 2 [A cos(2njt/w) + B, sin (2njt/w)], + a) . fl? ; (3.30) where U is the mean of u,, A; and B, are the Fourier series coefficients, j is the harmonic and his the total number of harmonics which is equal to w/2 or (w-1)/2 depending if w is even or odd respectively. For instance, for monthly series where w = 12, h = 6; for weekly series with w = 52, h = 26 and for daily series with w = 365, h = 182 The mean u and the Fourier coefficients A, and B. are determined by J ] ae u, (3.31) we ot 7yh (3.32) and eae (3.33) When w is even the last coefficients Aj, and B, are given by sos ( 2thE A uy cos(y ) (3.34) and B (3.35) When of Eq. (3.30) is determined considering all the harmonics" j=1, ..., h, (all the coefficients A; and By), 0, is exactly the same as u_ for all the values of t=1, » w. In practice though, 2 smaller number of harmonics h¥*ch is used. That is 9, of Eq. (3.30) should be computed with only those harnfonics which are "significant" or which "significantly contribute" to the variability of u,. For instance, for monthly series we12 and the total number of harmonics is 6 with j=1, 2, 3, 4, 5, and 6. However, for a particular case, only h*=3 harmonics may be needed, such as i=l, 2, 3, or jel, 2, 5 or any other combination of 3 harmonics which are showed to be the most significant. The criteria and procedures for selecting those harmonics to be used in Eq. (3.30) are discussed in the next section 3.3.3 SELECTION OF SIGNIFICANT HARMONICS AND FOURIER COEFFICIENTS The experience in using Fourier analysis for estimating periodic parameters of hydrologic time series shows that for small time interval series, such as daily and weekly series, only the first 4-6 harmonics are necessary for a good Fourier series fit in the periodic estimate u,. For instance, when dealing with daily series there are a total of 182 harmonics. However, out of this number rarely more than 6 harmonics are needed for obtaining a Fourier series estimate % which will closely fit the estimate wy A similar situation occurs for the case of weekly series‘ or any other series with a relatively small time interval. For monthly series, about 4 harmonics may give a good fit, although often in practice the Fourier analysis is not used for such type of series. 78similarly, for bimonthly, quarterly or similar series the Fourier analysis is not applied. The above practical criteria can be used as a first guide for finding the number of harmonics and corresponding Fourier coefficients which will enter into Eq. (3.30). However, such criteria should be supple- mented by more precise analysis and tests. Some approximate tests based on theoretical and experimental results are given by Yevjevich (1972b) A graphical and likely the most accurate test for selecting the harmonics in Fourier series fit of a periodic estimate, is the so called “cumulative periodogram test". Consider the periodic estimate uy with a mean u determined by Eq. (3.31) Such estimate could be either the periodic mean y,, standard deviation s_ or correlation coefficient rhs in Eqs. (2.7), (2.8) anki (2.10), respectively, or any other similar periodic estimate. The mean squared deviation (MSD) of u_ around u (equivalent to the definition of variance in stdtistical terms) may be determined by u Msp(u) =} 2 (a -a)? (3.36) On the other hand, consider the Fourier series estimate 0, of Eq, (3.30) with harmonics j=l, 2, ..., h and corre? sponding Fourier coefficients A; and Bj. The mean square deviation of @ around U is composed of the MSD(j) of each harmonic j, whith are determined by MSDG) = 5 (AP + BD, jet, .h (3.37) with A; and B, obtained from Eqs. (8.32) through (3.38) It can be shown that the sum of all the values of MSD(j) is equal to MSD(u) of Eq. (3.36). Assume that the values of MSD(j) are ordered in decreasing order of magnitude so that MSD(h;), yee BD represents the ordered sequence, h, is the harmonic with the highest MSD and h, is the harmonic with the lowest MSD The cumulative periodogram P, is the ratio of the sum of the first i mean square deviations MSD(h,), j=1, , ito the mean square deviation MSD(u) of Eq. )(3.36). That is i z MSD(h;) _ ial - Eom it took. Ca 79The plot of P; vs. i is called the cumulative periodogram A graphical criteria using the cumulative periodogram for obtaining the significant harmonics is given below, ‘The criteria is based on the concept that the variation of P, versus i is composed of the two distinct parts: (1) a periodic part of a fast increase of P, with i and (2) a sampling part of a slow increase of *P, with i. Two approaches are feasible for determining those two parts. First the two parts are approximated by smooth curves that inte: sect at a point, which corresponds to the critical harmon’ h¥ that gives the number of significant harmonies. The second approach is to assume approximate mathematical models of these two parts, to estimate their parameters and to find the intersection of two equations. The ordered harmonic nearest to the intersection point is then the critical harmonic. In the second approach, when 2, of Eq. (3.25) is an independent series, the sampling part of the cumulative periodogram, as referred above, is a straight line, whereas when 2, , is a linearly dependent series, the sampling part is a curve Figures 3.1 and 3.2 show the intersection point A for a periodic series with either an independent or a dependnet stochastic component, respectively. The value of P, at point A is determined by the sample size, while the valhe of i is much “less affected by the sample size and sampling variation. « Difficulties arise when the point A for a dependent stochastic component is in such a position that both curves (3) and (4) of Fig. 3.2 come out to be nearly one continuous curve, implying that the separation of the two parts of the cumulative periodogram becomes uncertain. Examples show that this case is less common in practice. Figures 3.3 through 3.7 show the cumulative periodograms for five statistical characteristics: mean y,, standard devia~ tion s_, and the first, second and third serial correlation coefficients , T1y)T,.~ and T, ,, for five discrete series: (1) 69 years of daily precipitation at Fort Collins, Colorado, from 1898 to 1966, Fig. 3.3; (2) 70 years of 3-day precipitation at Austin, Texas, from 1898 to 1967, Fig. 3.4; (3) 18 years of 7-day precipitation at Ames, Iowa, from 1949 to 1966, Fig. 3.5; (4) 40 years of daily discharge of the Tioga River near Erwins, New York, from 1921 to 1960, Fig. 3.6; and (5) 37 years of 3-day discharge of the McKenzie River at McKenzie Bridge, Oregon, from 1924 to 1960, Fig. 3.7. The harmonic i ranges from 1 to 182 for daily series, from 1 to 60 for three-day series, and from 1 to 26 for 7-day series. Because other precipitation and river gaging stations for 80Figure 3.1 Figure 3.2. Separation of the cumulative periodogram into the periodic part, for both the observed (1) and the fitted (3), and the sampling variation part, also for both the observed (2) and the fitted (4), in case of a periodic series with an independent stochastic component. L09y i ol ee eae we 0246 creas Separation of the cumulative periodogram into the periodic part, observed (1) and fitted (3), and the sampling variation part, observed (2) and fitted (4), in case of a periodic series with an autoregressive stochastic component 81025406080 100 120 140 160 180200 Figure 3.3. Cumulative periodogram of five parameters of daily precipitation series, Fort Collins, Colorado. Figure 3.4. Cumulative periodogram of five parameters of 3-day precipitation series, Austin, Texas. 82Figure 3.5. Cumulative periodogram of five parameters of 7-day precipitation series, Ames, [owa ‘0204060 a0 100 120-140 160180 200 Figure 3.6. Cumulative periodogram of five parameters of daily flow series of the Tioga River. 83,Figure 3.7. Cumulative periodogram of five parameters of 3-day flow series of the McKenzie River , 3-day and 7-day discrete series show results that are similar to those of Fig. 3.3 through 3.7, the following conclusions are generally valid and can be applied in most cases (1) The estimated means y, and the estimated standard t deviations s_ for the precipitation series are periodic whose cumulative périodograms P, are composed of two parts: a steep rise from i= to about as a result of the periodic part, then a slow rise following approximately a straight line due to the independent stochastic component. The curve P. vs. i for y_ is always above the corresponding curve for s_; this differénce comes from the larger sampling variation of the second moment s? than of the first moment J, (2) The cumulative periodogram for the serial correlation coefficients, ry ,, tp ,, and tz ,, computed after periodicities in the mean (¥_) and in the standard deviation (s_) are removed, from the original precipitation series, shd approximately a straight line relationship. This indicates that the residual series z, of Eq. (3.25) would be an inde~ pendent series MO! (8) The cumulative periodogram of the estimated means y_ and the estimated standard deviations s_ of one-day, ifree-day and of seven-day runoff series show a sharp rise from isl, to about i=3-6, then a slow curvilinear rise up to The first part of the cumulative periodogram indicates a 84significant periodicity, whereas the second part indicates that 2, of Eq. (3.25) would be a dependent stochastic component. (4) Rivers with runoff .predominately produced by rainfall show no periodicity in serial correlation coefficients, as shown by Fig. 3.6. (5) Rivers greatly affected by snow accumulation and melt, or river regimes with combined runoff from rainfall and snowmelt, usually show periodicity in serial correlation coefficients, as shown by Fig. 3.7. (6) As expected, the sample size affects smoothness and reliability of the cumulative periodogram P, vs. i as shown by comparing Fig. 3.5 with 18 years of data and Figs. 3.3 and 3.4 with about 70 years of data The mean square deviation MSD(j) of Eq. (3.37) is often interpreted as the part of the variance of u, which is con- tributed by the harmonic j. Hence, the cumfulative periodogram P, would represent the explained variance yield by the first. i‘ harmonics arranged in decreasing order of magnitude of the sequence MSD(j), » ..+, h. Using this concept, a criteria often used in practice for determining the number of significant harmonics has been to set a fraction or percentage of explained variance, say 90% or 95% and pick the number of harmonics which “explain” that specified amount of variance. However, due to the complexity shown by periodic hydrologic series as described in the above examples (see Figs. 3.3 through 3.7), the criteria based on a given percentage of explained variance would be rather arbitrary and we would prefer to avoid it, unless is it used only as a rough guide or first guess of the significant harmonics. Summarizing, for selecting the number of significant harmonics to be used in Fourier series estimate of periodic parameters we advise the following steps: (1) Use the cumulative periodogram of the periodic estimate u. and select the number of harmonics h* by defihing graphically the intersection point A as explained above. (2) Determine the fitted periodic estimate 0, based on the number of harmonics h* selected‘in (1) and compare it graphically ‘with the estimate uj. (3) (optional) to further verify the Fourier series estimates fi, and 6, of the periodic mean and standard deviation, ‘respectively, compare the 85.skewness and correlation coefficients of the residuals 2, , by using (a) p, and 6, and (b) ¥, ands, in Eq. (3.25), If such computed properties are statistically comparable, then the fitted Gd, and a, would be appropriate. Once the number of significant harmonics h* _ is determined, the Fourier series estimate ¢_ of Eq. (3.30 becomes t hk 0, = 0+ E [AL cos(anh,t/w) + Ba sin(2nh,t/w)] (3.39) where h;, il, ..., h* are the significant harmonics and A, and B, are the corresponding Fourier coefficients. i i 3.4 ESTIMATION OF PARAMETERS OF MULTIVARIATE MODELS The estimation of parameters of multivariate models in general presented in Chapter 7, and of disaggregation models in particular, presented in Chapter 8, requires the solution of the matrix equation BB! = D, That is, given that the elements of the matrix D are known, it is necessary to find the elements of a matrix B such that the product of B times its transpose B! is equal to D. This section deals exclusively with the solution of such type of matrix equations. Any solution for B which will produce BB? =D is a valid solution. In general, there exists an infinite number of solutions which will reproduce D. The solution for BB? = D may be obtained by the principal component analysis, but B is not uniquely identified and it is a rather complex procedure. However, if Bis assumed to be a lower triangular matrix (see Appendix A1.3) then a unique solution can be found by the square root method (Young and Pisano, 1968) when D is a positive definite matrix or by a method proposed by Lane (1979) when Dis at least a positive semidefinite matrix. Therefore, if B is a lower triangular matrix and D is a positive definite matrix, then the non-zero elements of B may be determined by (Graybill, 1969) pi = alii), for jst, tel, ..., 0, (3.40) 86_ qi Ae pd = {at - x (b)*) , for js2, ..., n, is} (8.41) k=1 and . fa fl analog pil [= mas a , for j=2, ..., nel, isjtl, ..., 0 a (3.42) where b” are the elements of B, d! are the elements of D and n is the size of the matrices B and D. On the other hand, if B_ is a lower triangular matrix but D is either a positive definite or a positive semidefinite matrix, then the elements of B may be determined by (Lane, 1979) ii bS=0 forall k i, when (3.44) di. x (il)? i when 2 (by - x wiyroo . (3.45) isi Equations (3.43), (3.44) and (3.45) are applied first to calculate the first matrix column, top to bottom, then the second column, third etc. It can be easily shown that BB? must always be at least positive semidefinitive. However, due to computer roundoff errors, it is common for a singular matrix to appear to be negative indefinite. Equation (3.44) overcomes this predicament, in addition to handling the singular case. This solution will be referred to in later chapters in order to solve matrix products of the same form. One should always check that the solution for B, when postmulti- plied by itself transposed will reproduce the original matrix D. 87,3.5 TESTS OF GOODNESS OF FIT Various statistical tests are available for testing hypotheses in hydrologic time series modeling. These tests are either approximations, or they are exact, provided the basic conditions in the derivation of these exact tests are satisfied. Mathematical statistics are full of various parametric or non-parametric tests. However, the practice has shown that a small number of these procedures would satisfy the needs in the analysis and modeling of hydrologic time series. Three types of tests are mainly found in current hydrologic practice: (1) Drawing of probability limits around an assumed (hypothesized) population value or function and comparing whether the sample estimates fall within or outside of these probability limits. An example of such a method is the test of independence of a hydrologic time series based on the correlograms as described in Sec. 2 (2) Use of test parameters with known exact or approximate sampling distributions of these test parameters One such test parameter is the test as referred in Sec. 2.2.4 for detecting changes in the mean. Another test parameter is the chi-square test for testing the hypothesis of goodness of fit of a given distribution. (3) Use of the Smirnov-Kolmogorov statistic, namely the maximum absolute difference between the cumulative frequency curve of sample data and the fitted distribution function. This test statistic is approximate, and cannot compete in reliability of test with the other two types of tests. In each of the above three types of tests, the probability level is selected for determining the critical testing parameter value. If the sample estimate of the test parameter exceeds such critical value (on either of the two sides, or only one side of its distribution) the assumed hypothesis is rejected. The modeling of a hydrologic time series usually assumes that the stochastic component, after removing periodic components and time dependence structure, is an independent and normally distributed series. Similarly, when modeling multivariate series the assumptions are that the stochastic component is time and space independent as well as normal Tests for these assumptions are described below. In addition, the comparison of the historical and model correlograms and of the historical and generated statistics may be used for further testing the fitted time series model. Such comparisons are explained in Chapters 4 and 7 883.5.1 TEST OF INDEPENDENCE Test of Independence in Time The Anderson test of the correlogram and the Porte Manteau lack of fit test are usually applied for testing the independence of a time series. The Anderson test was given in Sec. 2.3 when describing the characteristics of annual time series, so it will not be repeated here in this section. The Porte Manteau lack of fit test was utilized by Box and Pierce (1970) as an approximate test of model adequacy. Hipel and McLeod (1977), Delleur (1978) as well as others applied it for verifying linear models of hydrologic time series. The cumulative periodogram can be also used for testing the independence of a series, especially when it is derived from a series which originally had periodic components. Both the Porte Manteau lack of fit test and the cumulative periodogram test are described below The Porte Manteau Lack of Fit Test. Consider that a time series x, of size N is represented by an ARIMA (p,d,q) model (see Sec. 6.1) where p is the number of autoregressive terms, d is the number of differences and q is the number of moving average terms, Assume that after d differences the ARMA (p,q) series 2,, t=1, ..., N-d is obtained and assume further that in such models ¢, is the residual series. We would like to apply the Porte Manteau lack of fit test to check whether ¢, is an independent series, hence, whether the models are adequate. This test uses the statistic L Q= (Qed) 2 rhe) (3.46a) k=l where rj,(z) is the correlogram of the residuals ¢, and L is the maximum lag considered. The static @ is approximately chi-square distributed with L-p-q degrees of freedom. The adequacy of the ARIMA model for x, or of the ARMA model for 2, may be checked by comparing the statistic Q with the chi-square value x(L-p-q) of a given significance level. If Q YL9/2 | (3.53) where U,_,/9 is the 1-a/2 quantile of the standard normal distribution. Therefore, if ¥ of Eq. (3.52) falls within the limits of expression (3.53) the hypothesis of normality is accepted. Otherwise it is rejected. Actually, the above test is sufficiently accurate for N > 150. For smaller sample sizes, Snedecor and Cochran suggest instead to compare the computed coefficient of skewness ¥ of Eq. (3.52) with a tabulated value y,(N) which depends on the selected probability level a and on the sample size N. Table 3.2 gives the values of y,(N) for a= 0.02 and 0.10 and for various values of N. Thus, if %< y,(N), the hypothesis 6f normality is accepted Table 3.2, Table of Skewness Test for Normality for Sample Size Less than 150 (after Snedecor and Cochran, 1967, p. 552) a | a N | 0.02 0.10 N 0.02 0.10 ee 1.061 0.711 70 ! 0.673 0.459 30 0.986 0.662 80 0.631 0.432 35 0.923 0.621 90 0.596 0.409 40 | 0.870 0.587 100 0.567 0.389 aS 0.825 0.558 125, 0.508 0.350 50 0.787 0.534 150 0.464 0.321 60 0.723 0.492 175, | 0.430 0.298 93

Applied Modeling of Hydrologic Time Series

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Applied Modeling of Hydrologic Time Series

Transféré par

Droits d'auteur :

Formats disponibles

Vous aimerez peut-être aussi