Vous êtes sur la page 1sur 195

A COHERENT EXAMINATION OF RAINFALL AND FLOOD

DATA IN SOME SELECTED SITES OF PAKISTAN"

BY
ABAD ALI

ROLL NO: 0563-M.Phil-STAT-2011


SESSION 2011-2013

DEPARTMENT OF STATISTICS
G C UNIVERSITY, LAHORE (PAKISTAN)
DECLARATION
I, Abad Ai Roll No. 0563-M.Phil-STAT-2011, student of MPHIL in the subjects of Statistics
Session 2011-13 hereby declare that the matter printed in the Thesis on A COHERENT
EXAMINATION OF RAINFALL AND FLOOD DATA IN SOME SELECTED SITES OF
PAKISTAN" is my own work and has not been printed, published and submitted in the form of a
thesis in any University or Research Institute, etc. in Pakistan or abroad.

Dated: ______________

ABAD ALI

THESIS COMPLETION CERTIFICATE


Certified that the research work contained in this thesis titled on A COHERENT
EXAMINATION OF RAINFALL AND FLOOD DATA IN SOME SELECTED SITES OF
PAKISTAN" has been carried out and completed by Mr. ABAD ALI, Roll No. 0563-M.PhilSTAT-2011under my supervision.

Supervisor
Dated: _____________

__________________________
Prof. Dr. Saleha Naghmi Habibullah
Visiting Professor
Department of Statistics,
GC University Lahore

Submitted Through:

________________

___________________

Mr. Jaffer Hussain

Controller of Examination

Chairperson, Department of Statistics

GC University Lahore

GC University Lahore

ACKNOWLEDGEMENT
None is worthy of praise except gracious ALLAH, Who created the Worlds of numerous
creatures in the capacity of Absolute Authority. Almighty Allah has opened the new dimensions
of knowledge for me and has led me to complete this task. All my respects to Almighty Allahs
last Prophet HAZARAT MUHAMMAD(peace be upon him) who is the great mentor of the
world. He enabled us to recognize the Creator of the world and to understand the philosophy of
life.
I feel great pleasure in expressing from the core of my heart gratitude to my Supervisor who has
been cooperative in all circumstances. I am extremely appreciative of her keen interest,
motivational behavior, tolerance and inspiring guidance that enabled me to surmount this uphill
task. It has been a great honor for me to work under her supervision. Her comments and valuable
suggestions have played a vitally important role.
I am also very thankful to Prof. Sam C Saunders, Prof. Emer. Washington State University,USA
for his valuable comments, suggestions and guidance throughout the period of my research.
I gratefully acknowledge Mr. Jaffer Hussain,Chairperson, Department of Statistics, GC
University,Lahore for his polite, helping, encouraging and motivational behavior to complete this
task. Last but not the least, I would like to thank my dearest mother and my family members.

ABBREVIATIONS
AEP

Annual Exceedence Probability

MOM

Method of Moments

ECDF

Empirical Cumulative Distribution Function

MLE

Maximum Likelihood Estimator

GOF

Goodness of Fit

EVT

Extreme Value Theory

GEVD

Generalized Extreme value Distribution

IWD

Inverse Weibull Distribution

PPT

Philips Perron test

MAPE

Mean Absolute Percentage Error

MAD

Mean Absolute Deviation

MSD

Mean Squared Deviation

TABLE OF CONTENTS
Chapter 1 Introduction
1.1

Preliminary Remarks

1.2

Global warming

1.3

Extreme Events

1.4

Precipitation

1.5

Rainfall

1.5.1

Intensity Of Rainfall

1.5.2

Rainfall measurement

1.5.3

Rain fall impact on human life

1.5.4

Effects on agriculture

1.5.5

Effect on culture aspect

1.5.6

Influence of rainfall on Pakistan

1.6

Seasonal Environment of Pakistan

1.6.1

Winter

1.6.2

Monsoon

1.6.3

Pre Monsoon

1.6.4

Post Monsoon

1.7

Records and forecasting of weather in Pakistan

1.7.1

Hydraulic Structure

1.8

Flood

1.8.1

Flood damaging

1.8.2

Health Hazard

1.8.3

Agriculture, Livestock, and Fisheries

1.8.4

Education

1.8.5

Energy

1.8.6

Transport & Communication

1.8.7

Environment

1.9

Area used for analysis

Chapter 2

Literature Review

2.1

Introduction

Chapter 3

Methodology

3.1

Introduction

3.2

Quantile

3.3

Exceedence Probability

3.4

Method of Estimations

3.5

Method of moments

3.6

Maximum Likelihood Method

3.7

Trend analysis by Graphical technique

3.8

Q-Statistics

3.9

Autocorrelation

3.10

Class of Distributions

3.10.1

Extreme value Distribution value

3.10.2

Generalized Extreme value Distribution

3.10.3

Exponential Distribution

3.10.4

Gamma Distribution

3.10.5

Normal Distribution

3.10.6

Three parameters Log-Normal Distribution

3.10.7

Logistic Distribution

3.10.8

Nakagami Distribution

3.10.9

Weibull Distribution

3.10.10

Inverse Gaussian distribution

3.10.11Rayleigh Distribution
3.10.12

Frechet Distribution

3.11

Goodness of fit test

3.12

Probability Plots

3.13

Utilization of Software

Chapter 4

Least Squares Analysis

4.1

Introduction

4.2

Linear Models

4.3

Least Square estimates

4.4

Missing data analysis

4.5

Analysis of Mangla Flood Peaks

4.6

Analysis of site Shahdara Flood Peaks

4.7

Analysis of site Balakot Rainfall Station

4.8

Fitting parabolic trend to Flood and Rainfall data

4.9

Measures of accuracy for time series data

4.10

Risalpur Rainfall Site

4.11

Dir Rainfall

4.12

Kohat Rainfall

4.13

Marala Rainfall Site

4.14

D.G khan Site

4.15

Terbela Flood Site

4.16

Muzafarabad

4.17

Correlation Between rainfall and flood peaks

4.17.1

Lag Correlation

4.17.2

Cross correlation

Chapter 5

Record values

5.1

Introduction

5.2

Probability Density Function of Upper Record Values

5.3

Probability Density Function of Lower Record Values

5.4

Properties of Record Values

5.5

Properties for Lower record values

5.6

Lower Record Values from IWD

5.7

Lower Record Values from Frechet distribution

5.8

Maximum Likelihood Method

5.9

Means and Variances

Chapter 6

Stationary Models

6.1

Introduction

6.2

Tests For The Detection Of Stationarity

6.3

Unit Root Test

6.4

Dickey And Fuller Test

6.5

Philips Perron (PP) Unit Root Test

6.4

Analysis Of The Stationary Time Series Data At Different Sites

6.4.1

Marala Site Flood Peaks

6.4.2

Tarbela Site

6.4.3

Shahdara Site

6.4.4

Mangla Site

6.4.4

Muzafarabad Site

6.4.4

Balakot Site

6.4.4

Risalpur Site

6.4.4

Kohat Site

6.4.4

Dir Site

Chapter 7

Probability distributions

7.1

Introduction

7.2

Marala Flood Peaks

7.2.1

Exponential Distribution

7.2.2

Gamma Distribution

7.2.3

Normal Distribution

7.2.4

Log-Normal Distribution

7.2.5

Logistic Distribution

7.2.6

Nakagami Distribution

7.2.7

Weibull Distribution

7.2.8

Burr Distribution

7.2.9

Inverse Gaussian Distribution

7.2.10

Rayleigh Distribution

7.2.11

Generalized Extreme Value Distribution

7.2.12

Frechet Distribution

7.3

Shahdra Flood Peaks

7.4

Mangla Flood Peaks

7.5

Kohat

7.6

Muzafarabad Rainfall Site

Chapter 8

Expected loss

8.1

Introduction

8.2

A Decision Based upon Expected Loss

8.3

Pareto Distribution

7.4

Maximum Likelihood Method of Pareto Distribution

8.5

The variance and covariance of these MLE

8.6

Method of Moments

8.7

The variance and covariance matrix

8.8

Probability Weighted Moments

REFERENCES

CHAPTER 1
INTRODUCTION
1.1

Preliminary Remarks

Climate change is one of the hottest topics among the scientific community of the world today.
In particular, the phenomenon of global warming is a cause for grave concern for
meteorologists, oceanographers, and many other categories of scientists. United Nations bodies
(such as the World Health Organization) and national organizations (such as the US
Environmental Protection Agency) are investigating the risks to the inhabitants of this world due
to global warming.
One very important area of concern linked with the phenomenon of global warming is the
occurrence of flooding that is liable to cause loss of life and heavy damages to property. In
particular, people in the developing countries suffer heavily due to the damages caused by
excessive rain and flooding. Pakistan, for example, has experienced a number of floods during
the past few decades some of which caused excessive damage to life and property.

1.2

Global warming

The phenomenon of global warming has been taking place on planet earth for the past 15,000
years. Increase in the temperature of the planet gradually is considered as global warming. The
global warming is also called greenhouse effect. The greenhouse gasses (i.e carbon
monoxides,carbondioxide, sulpher dioxide etc) are the main caused of global warming. The
human beings, industries and vehicles are released these gasses.According to the scientists, these
gasses affect the atmosphere desperately. It caused to make a hole in the ozone layer, which is
working as a protector to the earth against the ultravioletrays released by the sun. Due to this
phenomenon, the earth received a higher temperature that caused global warming.
According to the opinion of a vast majority of the scientists today, this undesirable phenomenon
is the result of industrialization, significant increase in the human population and other factors
for which we ourselves are responsible.
Global warming exerts a variety of negative effects on the planet and its inhabitants such as
reduction of territories, damage to marine ecologies, destruction of seasonal insects and many
others. Extreme events such as high storms, cyclones and hurricanes can cause enormous
damages and destruction of infrastructures. People may experience increased water-loss from
reservoirs due to dryness, long summers and short winters as well as extreme temperature both in
summer and winter.Sometimes, the situation may become very grave such as in the case of
severe famines.
1.3

Extreme Events:

Any event is considered an extreme event if its amount differs from its normal value to a greater
extent, for example, inundation droughts and earthquakes etc. Extreme events affect human life
and property to a large extent. As such, attempts aimed at accurate modeling of extreme events
carry great significance for protecting human life and property.
During the past few decades, researchers have been greatly interested in studying extreme events
including both lower and upper extreme events. The smallest value of a data-set and the largest
value of a data-set, both are studied as extreme events. Analysis of record values (lower and
upper) based on some probability distributions have been studied.

1.5

Rain fall:

Precipitation in liquid form is called rain. Surface water on earth crust is evaporated by sun rays
then converted into clouds followed by the returning back to the earth surface in the form of
drops. This type of precipitation is called rainfall.
1.5.1

Intensity of rainfall:

Rainfall has broader effect on the socio-economic and human culture, so it is necessary to
measure it up. The intensive levels are classified as followed. Light rain is rated as 0.098
inches/hour.
The rain is considered moderate as it lies between 0.098 to 0.3 inches per hour. Heavy rain is
packed as 0.3 to 2 inches per hour. Extreme rain is reduced as 2 inches per hour to maximum.
1.5.2

Rainfall measurement:

Sectors like industry forestry and agriculture require swift updating of rainfall measurement.
Standardized rain gauge is used to detect rain and snow. Rain gauge is a device to measure the
depth of precipitation per unit area (m2) counted as millimeters. One litter water is precipitated
as if there is one millimeter per square meter rain fall. There is another unit as inches/square foot.
A rain gauge is a funnel with its upper end opens a storage beaker to measure depth of stored
water.
Pakistan meteorological department measure the rainfall at different stations and record is being
kept at different intervals like daily, weekly, monthly and annual basis.
1.5.3

Rain fall impact on human life:

Rainfall is a natural consequence of climate influence here on earth especially for those areas
which are far from irrigation systems. Rainfall has a splendid effect on human beings including,
mode, celebrations, socio economic sectors, poetry etc.
1.5.4

Effects on agriculture:

Rainfall is a natural phenomenon that has been specious impact on human existence. After some
regular intervals the rainfall is necessary for the plants to survive and nourish. Excessive
irregularity in rainfall patterns affects adversely the agriculture sector and its allied socioeconomic sectors like irrigation pattern and grain storage along with fodder.
1.5.5

Effect on culture aspect:

Rainfall has a great effect on the socio-cultural aspect of life. The socio-cultural aspect has much
relation to the economy. A society having strong economic grounds has a much developed
culture. Agricultural economies are largely affected by the rain fall patterns. So there life styles
and culture are also getting affected. Rainfall has a direct influence on the behavior and moods of
people. Excessive rain getting country considers sunshine a blessing, a dry land devoid of rain
considers a single drop of rain a heavy blessing. Absence of rain and floods has some great
psychological and social effects. Poetry verses, objects, music attitudes, literature writings are
also affected by the rain. After all rain as a weather event have a great impact on human life.
1.5.6

Influence of rainfall on Pakistan:


Pakistan is a country of diversity with an average climate of hot and arid. Its an

agricultural country with latitudes 24N to 37N and longitudes 61E to 71E. Some areas receive
heavy rainfall, some with moderate and some with areas receiving light rainfall. Monsoon areas
get heavy rainfall through monsoon season and floods are common happening due to lack of
proper management of resources.
1.6

Seasonal Environment of Pakistan

Pakistan has a variety of weather including every type of nature. It following has four seasons.

1.6.1

Winter
Pre-Monsoon
Monsoon
Post-Monsoon
Winter

The winter season in Pakistan almost existing in December, January, February and March but it
has a variation in different areas of Pakistan. Some areas received high cool and some are
slightly cool. The area of Himalayan received a heavy snowfall in winter season.
1.6.2

Monsoon

The monsoon is very popular in Asia due to the change in climate and the occurrence of rainfall.
This word (monsoon) is derived from Arabic word (Mawsam) and Portuguese word (moncao).
The winds in monsoon season enter in Pakistan from the Indian Ocean and Arabic Sea. These
monsoon winds caused a heavy rain and storms in the related areas of Pakistan. These are also
caused of floods in affected areas. The extreme peak of monsoon is happened in August. Its
duration is consists of the interval (July to September).
1.6.3

Pre Monsoon

Pakistan usually received a dry and hot weather in summer. From one aspect, this hot season is
harmful for Pakistan. Because, it melts the ice on glaciers which caused heavy floods like flood
of 2005 in Pakistan. The monsoon has started at the end of summer season thats why the
summer season is also called pre-monsoon. The summer season is consist of the period from
April to June.
1.6.4

Post Monsoon

Monsoon duration ends up to the last of September. Its duration is very short including October
and November. In this period a few rain has been received which is very useful for agriculture
point of view.
1.7

Records and forecasting of weather in Pakistan

Proper Planning for any kind of formulation and structures is based on proper forecasting about
the event. For the purpose of weather records and forecasting Pakistan Meteorological
Department is working. In 2010 the highest temperature in Pakistan was recorded on 26 of the
May in Mohenj-o-daro ,in the province of Sindh. It was the most reliable measurement of the
hottest temperature in Asia ever recorded.

1.7.1

Hydraulic Structure

Pakistan has possessed the multi kind of land. It consists of Mountains, River, Desert cultivated
land etc. It has been classified in two main regions according to its geographical importance. The
first is based on Indus basin and second is based on the dry areas. Irrigation system of Pakistan
based on different rivers including (i) Satluj (ii) Jehlam (iii) Chenab (iv) Indus (v) Ravi. The
Hydraulic structure of Pakistan depends upon Dams, Weirs, Barrage, Rivers, lacks etc.
Pakistan has faced a dual type of problems, scarcity of rainfall as well as excessive floods and
rainfall. In some cultivated areas of Pakistan irrigation through the rainfall is the best irrigation
system. But the lack of preservative methods of overflow rainfall water caused huge type of
floods.
Global warming involved two main categories caused the change in climate (i) the natural
variability and (ii) the human activities. The greenhouse gasses, use of fossil fuel, properties of
the land surface, features of vaporizers and natural phenomenon are the major causes of the
global warming. Due to the large extent in temperature as a result the glaciers are melting rapidly
which caused the overflowing of the water named floods.
1.8

Floods

Floods are the most caustic natural catastrophes that occur in many parts of the world. These
floods have been renowned as the most costly orthodox hazards having high tendency to destroy
properties as well as human beings. These are very hard to predict due to the involvement of
many unnatural and natural factors in process of its occurrence.
An extreme situation due to the excessive rainfall leads to excessive losses of life and property.
This excessive level of water converted into a flood. Pakistan is being faced high intensity of
floods and flood damaging from last few decades.
1.8.1

Flood damaging

The monsoon in Pakistan occurred very severely, despite the forecasting with very low average
of the rainfall it comes with a huge amount in the mid of August in Southern areas of Pakistan, a

heavy rainfall is observed every year. The maximum rainfall is seemed in the beginning of July
and continued till the last week of September
1.8.2

Health Hazard:

The health infrastructure in rural areas is available for the sake of every kind of health and
provides the basics first aid. These infrastructures are being damaged with rainfall. Basic health
units and rural health centers suffer most damage. Millions of dollars invested in health sector
are ruined recklessly. The access of rainfall and flood also needed to investigate along with
health hazards measurements.
1.8.3

Agriculture, Livestock, and Fisheries:

Agriculture has the central role in the growth of economy. Being a primary activity it engages a
larger number of work forces for hand work. The major source of the livelihood of Pakistan
population depends directly or indirectly on agriculture. The Rabi crop has been known as the
main crop wheat which is the staple food of major portion of population. Fruits consisting
grapes, citrus, mangoes and vegetables include potato, tomato, chilies and onion. Livestock is an
integral part of agricultural scenario. Buffalo and cattle are main source of milk, meat and hides
along with drafting power. Fodder crops include wheat straw and maize thinning.
1.8.4

Education

Education department has a significant effect on the economy and the development of any
country. The education institutions in those areas (school Madras and colleges) are spread over
the distance. These institutions are constructed irrespective of such kind of safety aspects like
floods, heavy rainfall and earthquake. A heavy proportion of such institutes have been affected
completely or partially due to the flood. In flood (2011) the 4096 educational institutes were
damaged in Sindh and Baluchistan. In Sindh, among the total number of damages which is 3892,
the 1032 Girls schools are damaged completely or partially.
1.8.5

Energy

Energy department is considered as a backbone of any country. There are many mega and small
units are working to fulfill the requirements of energy for multiple uses. In Pakistan the
following units are working for this purpose
(i)
(ii)
(iii)

Thermal Plants
Hydro-electric plants
Small Nuclear Plants

Most of the energy depends on the hydrological department. Heavy rainfall and floods caused a
huge damaging in these units as well. A well and preplanned policies can save much heritage and
reduce the cost of damaging.
1.8.6

Transport & Communication

Communication and transportation is a need of time in recent as well as the conservative period.
The recent world is required global network to improve the basic factors involved in the
infrastructure of any community. A large number of modern sources of communications and
transportations exist over there. The fundamental sources in Pakistan are as follows
(i)
(ii)
(iii)

Roads
Railway Lines
Airports

The total area of Pakistan is 796095 Sq/km. In which the 259618 km are interlinked by roads,
7791 km area is connected by railway lines, and there are 42 airports in Pakistan. The flood of
2011 has destroyed the communication and transport infrastructures including coastal highways
roads, railway network etc. From the two provinces of Pakistan, five districts of Baluchistan and
eighteen districts of Sindh received a large amount of destruction in this field.
1.8.7

Environment

The environment provides the basics for every society and it is highly affected by the extreme
changes. Pakistan is already facing composite complications of different kind of disease which
are related to the environment.

Objectives of the Study

The objective of this study is to assess the appropriate statistical distributions which are used for
forecasting of the extreme rainfall and flood in coming years. The main objectives of this
research are
i.
ii.

To examine the trend and pattern of weather change


To assist inthe prediction of flooding under the projected climate changeanticipated

iii.

during this century due to global warming


To assess the suitable statistical distributions for flood and rainfall data

1.9

Area used for analysis

Pakistan receives very small amount of rainfall in most areas of Pakistan especially in those
areas whose are located below the latitude of 32 degree. The study area consists of mostly the
northern sites of Pakistan and Azad Jammon & Kashmir. The data has been gathered from the
Meteorological Department of Pakistan. We have selected the different rainfall and flood sites in
this analysis. As the rainfall sites are not very close to the flood sites but the geographical map of
these sites lies within the region covered both (rainfall and flood) areas.
This study is useful for prediction of the worst rainfall in coming years. It is based on the rainfall
data collection and collective compilation of available data by the contributing of Meteorological
Department of Pakistan.
Pakistan is an agronomic country. The ecological situation of Pakistan lies in between
37 N latitudes and

61 E -

24 N-

77 E longitude. Its climate is hot and arid; however there

is a vast diversity present in its climate. Some areas of Pakistan acquire a high rainfall, some get
impartial and some receive very small amount of rainfall. Pakistans monsoon regions usually
receive heavy rainfall during the monsoon period, which results in flood due to lack of proper
water resource management and planning.
1.10
S/N

Geographical positions of the site


Site

Latitude

Longitude

Balakot

N
34.33

E
72.21

Dir

35.12

Kohat

33.35

Elevation

Years

Lengt

Site

995.40m

1977-

h
36

Description
Rainfall

71.51

1375.0m

2012
1977-

36

Rainfall

71.26

489m

2012
1977-

36

Rainfall

4
5

Mangla
Marala

33.14
32.67

73.64
74.46

147m

2012
1925-

89

Flood

250m

2013
1925-

88

Flood

56

Rainfall

Muzafarabad

34.22

73.29

702m

2012
1955-

Risalpur

34.04

71.58

1014m

2010
1977-

36

Rainfall

820m

2012
1925-

88

Flood

148m

2012
1977-

36

Flood

8
9

Shahdara
Terbela

34.15
34.74

73.49
72.48

2012

Table 1.3
Table contains the root Map from Terbela dam To mangla dam through Dir, Risalpur, Kohat and
Balakot

Where point A indicates Terbela flood site,point B indicates Dir rainfall site, point C indicates
Risalpur rainfall site, point D indicates Kohat rainfall, point E indicates Balakot rainfall and
point F indicates Mangla flood site.

CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
Global warming is recently a great issue all around the World. The glaciers are melting
due the warmer temperature year by year with the passage of time. Change in temperature caused

the extreme events like temperature, rainfall, high floods etc. Increasing intensity of rainfall
caused high level of floods.
In the half of 20th century Professor Gumbel first time suggested the application of extreme
values distribution. (Gumble, 1941) used extreme value distribution for empirical analysis. The
statistician and engineers used it frequently later on. He used a meteorological variable (annual
flood flow, maximum precipitation etc) in 1941.
Huff and Neill (1959) used maximum magnitude of rainfall in Illinois and compared five
different statistical distributions. They analyzed annual maxima and seasonal for 1 to 10 days of
period. They used 30 stations having 40 years as a size of data analyzed and compute useful
statistical results. Method of moments and least square method are compared. The difference
between the results found insignificant.
Hershfield (1962) investigate the AMS series for the data based on the period of 24 hours rainfall
in USA. He used Gumble distribution which seems a good fit and give significant results
Alexander (1963) used the method of storm transposition for estimating the frequency of rare
events (Alexander, 1963).
Markovic (1965) used five probability distributions on the annual precipitation data along with
the river flows in Canada and USA based on 2506 gauge stations. These five distributions
named as candidate distribution including normal, log normal with two parameters, gamma of
two parameters, log normal of three parameters, gamma of three parameters. He found that the
gamma and log normal as insignificant results. Gamma distribution of one parameter has also in
significant results over three parameters (Markovic.1965)
Dickinson (1976), used extreme value distributions on rainfall data to developed some useful
rainfall extreme value distribution. He used the data of Southern Ontario from three stations. He
also suggested the analysis of seasonal patterns to estimate the rainfall run offs.

Lanwehr et al. (1979) used three methods for estimating the parameters which have lot of interest
in inferential statistics. Method of moments, Maximum likelihood and probability weighted
moment (PWM) method used in gumble distribution and made the comparison between these
three methods. He proved that PWM is good fit and best from all three methods.
In 1984. Stren and Coe fitted the non-stationary markove chains to the occurrence of rainfall.
They applied the Gamma distribution with using the different values of its parameters to the time
of years for total of the rainfall. They calculated the useful results from the models used for
rainfall data for prediction and planning.
In 1992, Haktanir applied thirteen different distributions to annual Rood peak series for more
than 30 observations taken from 45 unregulated streams in Anatolia. Parameters of these
distributions were mostly estimated by using method of maximum likelihood method of
moments and probability weighted moments (Haktanir, 1992).
The World Meteorological Organization in (1989) published a summary report favoring the
policy makers and engineers. In this report, different methodologies were discussed to estimate
the extreme events and utilization of different distributions to the data.
Akosy(2000), suggested that the gamma distribution found to be an appropriate distribution in
daily rainfall analysis. He used Markove chains to determine the dry and wet days through the
use of Gamma distribution. He used Gamma distribution to generate the sequence of such kind
of daily rainfall data.
Koutsoyiannis & Baloutsos, (2000) investigate the largest record of a long period of time
consisting of 136 years in Greece based on maximum annually rainfall data. They
furthermoreadvised that the use of (type 1) extreme value distribution was not feasible in its
conventionally used and suggested that generalized extreme value distribution is much better for
record values analysis and proved to be a good predictor for return periods.
Park et al. (2001) studied the maximum summer rainfall in South Korea from sixty one gauge
stations. He estimated L moments from Wakeby distribution and quantile estimation for the

different return periods. He renowned isopluvial maps for different return periods to the
estimated designs.
Kuczera (2001) utilized a comprehensive study of at site frequency flood data. He also used
Monte Carlo Bayesian method to estimate the confidence limits of quantile and the expected
probability distribution for any kind of frequency distribution of flood.
Pathak (2001) analyzed frequency analysis in South Florida Water management district for short
period of rainfall. The data used in this study based on the time interval of January 1, 1900 to
December 31, 1999. He used one day, three day and five day periods for highest rainfall.
Zalina et al, (2002) made a comparison between eight different candidate distributions to find the
best and reliable estimator of maximum annual rainfall in Peninsular. They applied extrapolation
of quantiles and a goodness of fit test and found that the generalized extreme value distribution is
a good fit to data.
Coles et al. (2003) used the concept of Beyesian inference in the modeling of rainfall data on
daily basis. He found a distribution which can predict the extreme rainfall in the coming years.
Ware & Lad (2003) used two different methods (frequentists and conventional) to make a
comparison of precision and accuracy of regional and at-site analysis of flood quantiles for
Wainmakariri River. He found that the frequentists method proved to be good as compared to
estimates by conventional method.
Ahsanullah, Chan and Balakrishnan (1993) discuss the recurrence relations between the
product moments of the extreme value distribution which are based upon record values. They
also settled the product and single moments utilizing these relations in very simple way.
Kamps (1995)investigate the order statistics and record values. He found someuseful
results which are applied to obtain the relations including explicit expressions and recurrence
relations for the moments of generalized order statistics of pareto, power function and Weibull
distributions. He gave the idea of generalized extreme values and their properties. Also derived
the joint density function of the first r and nth uniform generalized order statistics .He purposed
some necessary and fundamental conditions for the existence of moments of the generalized

order statistics.
Barbson and Palutikof (1999) used extreme wind speeds for frequency analysis based on five
Scottish islands. He applied the Generalized Pareto Distribution to the data and found the failure
behavior of the distribution in the presence of non-stationary in the wind speeds.
Pawlas and Szynal (2000) discussed some necessary conditions of characterization for
inverse Weibull and generalized extreme value distribution by the help of the moments of kth
record values.
In 2006 Thompson et al. Firstly introduced global index for earthquakes similar to index flood
method. They used the from 46 regions around the globe and showed that GPA and GEV are the
best fit to the magnitude and annual maximum series by using goodness of fit test and Lmoments diagram.
Soliman, Abd Ellah and Sultan (2006) investigate the Bayesian analysis of the Weibull
distribution having two parameters on the basis of record values.And estimate the Bayes and
Maximum likelihood estimators of record values. The hazard and reliability functions were also
discussed.
(Change, 2007) has reported that the eleven years from 1996 to 2006 were the warmest years.
Inthe third assessment report he had compared two intervals of the years to see the change in
linear trend. He found that from the years 1906 -2005 has a 0.74 linear trend which is increased
from 0.560C to 0.920C

and in the interval 1901 to 2000 years had received 0.6 0C change in

temperature. There is double linear warming trend in temperature of last half century as
compared to the change in temperature measured in last century.

Kao (2008) conducted an at-site frequency analysis of rainfall by using hourly precipitation dat
for 53 gauging station in Indiana. A combination of generalized extreme value distribution and
extreme value type-1 distribution was used to find at-site estimates and these estimates, at site
and regionals were compared. He found that the regional estimates were not better as compared
to at-site estimates.
Khan et al. (2008) found that the Frechet distribution is very useful and flexible having a
property to converge in different distributions. They used Monte Carlo Simulation to compare

the shape and scale parameters. They also gave some important results between the relation of
shape parameter to the Mode, Mean, Median, Variance, Coefficient of variance, Kurtosis and
Skewness. They also used mathematical and graphical technique for theoretical analysis of
Frechect distribution.
Sultan (2008) used Bayesian and Maximum Likelihood method to estimate the
parameters of Frechet distribution. He worked out with two different cases, one is estimation of
both parameter (shape and scale) being unknown and other is keeping location parameter as
known. He estimated the hazard function, survival rate and made a comparison between mean
square errors which is estimated by simulation method.
Kwon et al. (2008) used the gumble mixed model to analyze the bivariate storm frequency
analysis. They used hourly rainfall data collected for 34 years at Jecheon station in Korea. They
estimated the bivariate return periods, joint return periods and conditional return periods of storm
events.
Jakob et al. (2009) investigated the pattern of extreme rainfall in Sydney Australia in the absence
of stationary in the data. The data was based on the years 1921- 2005 of Sydney Observatory
Hills. He found that the rainfall pattern in Australia showed a large amount of variation both
seasonally and spatially.
In 2009, NOAA National Climate Data center composed a climate report indicating that the
temperature in last few decades had increased. This report was based on 300 scientists from all
over the world including 160 research groups. They used ten indicators to see the behavior of the
weather and temperature. Among these ten indicators seven are found to be increasing which
indicates increasing temperature duringthe last few decades.
Table 1.2
Indicators of Warming World
Sr #
1
2
3
4
5
6

Indicators
Air temperature near surface
Humidity
Snow cover
Temperature over oceans
Sea surface temperature
Sea ice

Increasing
Yes
Yes
No
Yes
Yes
No

Decreasing
No
No
Yes
No
No
Yes

7
8
9
10

Ocean heat content


Temperature over land
Glaciers
Sea level

Yes
Yes
No
Yes

No
No
Yes
no

Hamdi (2011) presented a new approach to estimate droughts in Tunisia. He conducted


frequency analysis which used DeceedanceProbability for the first time. Historical information
was completely and authentically evaluated in the study. A combination of weibull formula and
log normal-III was considered as best fit to the data.
Weiss et al.(2013) studied the probabilistic patterns of extreme storm surges happened beside
the French coasts of the Atlantic Ocean. They developed homogeneous regions to predict the
occurrence of the extreme storm surges on the same storm events of the regions included the
Eastern and Western English Channel regionsand Atlantic region.
Kantima Meeyaem et al. (2014) had proposed hybrid model for flood forecasting using case
study area. They discussed the three models based on mathematical techniques namely, Gumbel
distribution function, drainage density and Muskingum method. They found that hybrid models
are very useful for flood prediction especially in small size of data.

Chapter No 3
Methodology
3.1

Introduction

The purpose of this analysis is to evaluate the relationship between the length and intensity of the
extreme events (like heavy rainfall and worst flood)to the chances of these events to be
happened. In this study we have taken annual floods peaks of the four dams of Pakistan and
some annual rainfall data of different stations. These rainfall stations have mostly nearby
locations of the dams. The empirical data analysis has been done by using descriptive summary
and graphical techniques. The graphical techniques including (probability plots, histograms,
density graphs, empirical distributions function, etc) which can be used to find the probability
density function suitable for relevant data.
3.2

Quantile

Quantiles have the vital importance in the field of statistics. It splits the distribution into the
desired parts as many as a researcher needed. Particularly in the sector of hydrology, the
estimation of the intensity of rainfall, floods, drought and storms etc. It provides the basis for
future planning polices and hydraulic designs. Quantile is a value of extreme event that had a
particular probability of exceedence and a specified return period of that extreme event.
3.3

Exceedence Probability

It is simply a probability that any particular event will across the specific value. In frequency
analysis of flood peaks, the probability of flood height cross the available capacity is known as
exceedence probability.
The Annual Exceedence Probability (AEP) is the expected chance of the occurrence of the
natural hazard event (such as rainfall or flooding event) within a year it is mostly expressed in
percentage form. Extreme flood events occur (exceeded) rare times. Then the event will have a
lesser annual probability. It is denoted by

n 1

AEP 100

Where k is the rank of the observed values and n is the total observations.
3.4

Method of Estimations

There are many methods and advance techniques available for estimating the parameters.
Different researchers used different techniques according to their convenient approach.
Followings are some methods which are commonly used
Method of moments
Maximum likelihood method
Weighted probability moments
3.4.1

Method of moments

Method of moments (MOM) is very conservative and oldest technique. It is developed and used
by Karl Pearson (1857-1936). In this method the sample moments (Raw moments) is being
equated to the corresponding population moments.
The non-central rth population moments are calculated as

x f x dx
r

for a continuous random variable x.

And the sample moments are

mr

3.4.2

1 n r
X i , r
n i 1

Maximum Likelihood Method

Maximum likelihood method is a well-known and most frequently used method for estimating

L x1 , x2 ,..., xn ;
the unknown parameters. Let

is a likelihood function of xi random variables

based on unknown parameter . Now let

x1 , x2 ,..., xn
is the value of

is a function based on sample observations, i.e

L x1 , x2 ,..., xn ;
that maximizes

then

is called the ML estimator

of . MLE can be found by using the following equations for :

L .
0

or

LogL .
0

2 L .
0
2

(3.4.1)
The equation (3.4.1) gives the maxima or minima of the given distribution

L .
The function

log L .
and

gives same value of parameter

and sometimes the log

likelihood function Log L(.) is proved much easier to find the values of

as compared to

likelihood function.
Crammer Rao Inequality can be used to find the variance of MLE
2 log L .
V E

2

The important properties of MLE are as follows:


ML estimators are usually consistent but under some regularity conditions, when the

n
sample size becomes large enough i.e
, then the unique consistent MLE is exist.
ML estimator attain the normality assumption when n becomes very large
MLE are efficient and consistent when n increases
A sufficient estimator always found by ML method if the sufficient estimator exists.
The ML estimator is biased and can be moderately biased when sample size is quite

small. And the lack of normality may be suspected


It is always positive or zero i.e cannot be negative
MLE are in-variant under some functional forms of transformations.
It is only found when the density function is available.
$
MLE may or may not be unique in solution. If
is a MLE of and h() is 0ne-to-one

function of then h( ) is a MLE of h().


3.5

Trend analysis by Graphical technique

In time series analysis, the long term movement is called a trend. Trend can be of two types (i)
increasing with the passage of time (ii) or decreasing with time. A time plot is used to see
whether it is increasing or decreasing. Time plot is a simple graph which represents the relation
of the time to the corresponding values.

3.6

Q-Statistics

The Ljung-Box Q-statistics at lag k is a test statistic used for the null hypothesis i.e, there is no
autocorrelation up to order k . It is calculated as
k

QLB n (n 2)
j 1

rj2
n j

rj

Where

the j-th autocorrelation and n is the number of observations. If the series does not based

on the results of ARIMA estimation, then under the null hypothesis,


asy

~ (k )
2

2
That is asymptotically distributed as a

with degrees of freedom equal to the number of lag of

autocorrelations.
3.7

Autocorrelation

There are three types of the data used for empirical analysis.

Cross section
Time series
Pooled

The pooled data is a combination of time series and cross sectional data. Time series data is an
important type of data for any empirical analysis. A number of assumptions had been made for
developing the models on these types of data.
The examination of the relation of two or more set of variables has always a great interest for
investigators. This kind of relationship is considered as correlation among those variables.
The correlation between two sets of random variables(X, Y) is the interdependence between
these random variables.
The correlation is measured by using the formula:

E X E X Y EY
E X E X EY EY
2

For sample data it is calculated by using the formula:

( x x )( y y ) / n
i

i 1

( xi x ) 2 / n
i 1

( y y)

i 1

/n

cov x, y
SxS y

Autocorrelation provides a good lead to investigate the properties of a time series. The auto
correlation

is

the

simple

( x1 , x1 k ) , ( x2 , x2 k ),..., xn k , xn

correlation

between

pairs

of

observations,

k 0

and called the auto correlation at lag k.


nk

r (k )

(x
t 1

x(1) )( xt k x(2) ) / ( n k )
n

(x x )
t 1

where

nk

/n

x(1) xt / n k , x(2)
t 1

t k 1

/ n k and

x xt / n
t 1

The cross sectional data is collected through a random sample of cross-sectional units. For
example from a data of households consumption collected through sample survey, one cannot
believe in advance that the random error term of one household is correlated with another
household. If such type of correlation exist in the cross-sectional units is called Spatial
Autocorrelation.

3.7

Class of Distributions

The probability distributions are used in various fields of research (hydrology, economic
variables, civil engineering designs and models, weather forecasting and flood risk
management).
The following distributions are used to carry out the analysis of rainfall and flood data.

3.10.1

Extreme value Distribution

The extreme events give very negative impacts in some fields. These events are very rare in
happenings but have great consequences. For example large amount of snowfall, extreme floods,
high temperatures and storms or wind speeds etc. The most researchers and analysts used EVT
(extreme value theory) for developing the suitable models to evaluate the loss and risks due to
the extreme events.
The probability density function of extreme value distribution is as follows:

f ( x)

1
x
x
exp[(
) exp{(
)}]

where

Where is a scale parameter and is a shape parameter. If Z follows a weibull distribution with
parameters (,) then the Log(Z) is followed as Extreme Value Distribution with

log and
3.10.2

Generalized Extreme value Distribution

The Generalized Extreme value Distribution abbreviated as (GEV) is belong to a class of


continuous probability distributions. It is also known as Fisher-Tippett distribution. Extreme
value distributions recognized as limiting distribution for optimization problems. The GEV
distribution is used to normalize the maxima or minima obtained from a set of identical and
independent random variables. A theory depends upon extreme values (EVT) provides the basic
for measuring and modeling such kind of events which have very low chance of occurrences.
f ( x)

1
x k 1
x k
[1 k
] exp[{1 k
]

K is a shape parameter of GEV distribution and


x
1 0

k 0

the GEV distribution converge into Type I when

and in Type (III) for

k 0

k 0

and Type (II) for

k 0

3.10.3

Exponential Distribution

Exponential distribution is a kind of continuous distribution. It measures the length of time of


occurrence between two events in a Poisson process. These events are rare events .The pdf is

f ( x) e

where

0 X
0

And the distribution function is as follows:

F ( x) 1 e x
3.10.4

Gamma Distribution

Gamma distribution is a two parameter continuous probability distribution. It is very useful in


sense of its flexible property. The chi square and exponential distribution is somewhere called as
the children of Gamma distribution.
x

f ( x) x

a 1

Where

e b
b a a

x o, a , b 0

is a gamma function defined as

a X a1e x dx
0

3.10.5

Normal Distribution

The Normal distribution is the very well-known and frequently useable continuous distribution.
This distribution has another name as Gaussian distribution. Its pdf is as

f ( x, , , )

1
( x )2
exp[
]
2 2
2

and 0

3.10.6

Three parameters Log-Normal Distribution

The log normal (3P) is distribution which belongs to a class of continuous distributions. It has
three parameters , and .The probability density function is
f ( x, , , )

1
{log( x ) }2
exp[
]
2 2
( x ) 2

ox

o, 0, 0
3.10.7

Logistic Distribution

The logistic distribution is also belonging to a family of continuous distribution. The shape of
logistic distribution is similar to normal distribution its peak is higher than Normal. The logistic
distribution is useful Hydrologic records (discharge in rivers, rainfall etc)
x
)]

f ( x)
x 2
[1 exp{
}]

exp[(

3.10.8

, 0

Nakagami Distribution

The Nakagami distribution was proposed by (Nakagami, 1960). It is useful to develop models for
the fading of radio frequency or signals. Application of this distribution spread around many
fields like communications, hydrology, analysis of multimedia, traffic over networks and
ultrasound data etc.
f ( x)

2 2 1
x2
( ) x
exp(
)

x0

is the shape parameter and is a scale parameter


if =1 this distribution collapsed into Rayleigh distribution and if =0.5 it converge
into half normal.

3.10.9

Weibull Distribution

In a large group of famous distributions, Weibull distribution is very useful to analyze the life
time data. The Inverse Weibull distribution is also pay a vital role for predicting and analysis of
many extreme events like earthquakes, rain fall, sea currents, floods and wind speeds etc.
Applications of the Inverse Weibull distribution in many fields given in Harlow (2002) who
found importance of this distribution for modeling the statistical behavior of material properties
for applications in the field of engineering. Nadarajah and Kotz (2008) pointed the sociological
models based on Inverse Weibull randon variables.The scale form of the Inverse Weibull
distribution has its density function given by
f x cx c 1e x

x 0, 0, c 0

Where c is the shape parameter and is the scale parameter.

While the location-scale form of Inverse Weibull distribution has its density function given by
c
x

c x c 1
f x
e

x 0, 0, c 0, 0

Where c is the shape parameter, is the location parameter and is the scale parameter.

3.10.10

Inverse Gaussian distribution

The inverse Gaussian distribution derived in 1915 by Schrodinger. In 1945 Tweedie proposed the
name of this distribution as inverse Gaussian distribution. In 1947 Wald revised this distribution
and suggested it as a limiting form of samples of sequential probability ratio test. Thats why the
inverse Gaussian distribution is also called Wald distribution. The pdf of it is as follow:

12
( x ) 2
f x [
] exp
2 x3
2 2 x
x0

, 0
and

Where is the mean and is the shape parameter

3.10.11

Rayleigh Distribution

Rayleigh distribution is belonging to a class of continuous distributions. It is used in complex


numbers, wind and speed wave length etc.
A random variable is said to follow a rayleigh distribution if it has the pdf
x
x2
exp(

)
2
2 2

f x

x0

Where is a scale parameter and

>0

The distribution function is as follows:


F x 1 exp(

3.10.12

x2
)
2 2
Frechet Distribution

The French Mathematician Maurice Frechet (1878-1973) gave a limiting distribution of the
sequence for local maxima that provides the scale normalization (Frechet, 1927).

Frechet distribution with probability density function

x
f ( x)

0, 0, and x 0

Where is a scale parameter is a shape parameter and is a location parameter. And the
cumulative distribution function is
x
F ( x ) exp

3.11

Goodness of fit test

The probability distributions have been applied on the different site of rainfall and flood data in
intermediate step. After that a goodness of fit test is carried out to see whether the distribution is
good for available data. For this purpose the following test are used
Chi Square goodness of fit test
Chi-square test has a wide application in the literature and commonly used for investigating the
good fit of any particular distribution to the data.
Chi-square test with null hypothesis
H0 = Distribution is a good fit for data
H1 = Distribution is not a good fit for data
Test statistic:
n

i 1

f0 fe
fe

with (N-n-1) degree of freedom

fo
Where

fe
known as observed frequencies. Whereas

are expected frequencies.

With critical region

2 2 ( v )
Where v is the degree of freedom
The conclusion is based on critical region and calculated value of chi square. If the calculated
value of chi square is greater than the critical value then one can reject the null hypothesis
otherwise accept.
Kolmogorove Smirnov test
Kolmogorove Smirnov test is another tool for testing the goodness of fit to the specified
distribution. The null hypothesis is used under this test is as
Ho : selected sample is drawn from the specified distribution.
H1 : selected sample is not drawn from the specified distribution.
The test statistic is used

Dn sup x Fn X F X

sup x
The

is referred as a supremum of a set of the ordered elements.

The critical decisions based on the value of Dn , if the value of Dn closer to zero
then distribution is considered a good fit to the data.
3.12

Probability Plots

Probability plots are commonly used as graphical technique for checking the basic
assumption about the nature of the data. The given data is plotted versus the
theoretical distribution and investigate the place of points around the line. If the
mostly points lie around the straight line then the theoretical distribution followed
observed data. We have used some probability plots to see the behavior of the data.
3.13

Utilization of Software

The work has been done by using different statistical software including MATLAB 5 ,
SPSS 16, MINITAB 15 and EASYFIT. Some graphical analysis is obtained using
R LANGUAGE.

CHAPTER 4
LEAST SQUARE ANALYSIS
4.1

Introduction

The researchers are always interested in the nature of relation between the variables. For
instance, a researcher is wantedto determine the relationship between the disasters and extreme
eventssuch as rainfall, storm, hurricane, earthquake and flood etc.
A number of works have been made to find the better and precise methods for the estimation of
linear models and fitting the data in recent years but the Least Square method is still dominant
and used as an important tool of estimating the parameters.
Least square methods is perhaps the most widespread technique in the field of statistics. There is
several factors behind this fact. Mathematically,the use of squares makes least square method
very submissive because the Pythagorean theorem directs when the error term is independent of
an estimated quantity one can might be add the squared error and squared estimated quantity.
Another mathematical aspect is the involvement of arithmetic tools ( eigen-decomposition,
derivatives and singular value decomposition ) in the construction least square method for the
relatively long period of time.
As this method is shown by its name Least Squares which is obtained by minimizing the sum
of squares of the deviations from the corresponding population observations. Method of least
squares is the combination of different observations as being the best estimate of the true value;
errors decrease with aggregation rather than increase by Roger Cotes(1722).

4.2

Least Square estimates

A preliminary examination of data has been done by fitting a straight line and some graphical
techniques to see what kind of variation exist. For this purpose , we fit a straight line to the data
to see if the slope is positive and to what degree. The least square method is commonly used to
find the estimates of the parameters. In this case a similar technique is used to find the estimates
and utilization of those estimates for the prediction of the diversity of rainfall in coming years. It
is suggested by Sam C.Saunders Prof. Emer. Washington State University.
Consider the yearly (maximum) flood height data over a period of say n years where there may

yi i j
be missing observations. The data is

j 1, 2,3,..., n
where

these yi represents real

measurements of the recorded yearly flood height on the ith year of the sample sequence.
A preliminary examination of data can be done by using some graphical techniques. Perhaps the
following simple types of examinations can be completed using elementary procedures. Consider
the expected model for the rainfall and flood data is

E Xk k
for k = 1,2,4, . . .n

We fit a straight line to the data

(4.2.1)

{ X k }k=1
n

S Xk k

k 1

Consider the sum of squares

(4.2.2)

yi i j
Consider the yearly (maximum) flood height data over a period of n years. The data is

j 1, 2,3..., n
where

yi
.These

represent real measurements of the recorded yearly flood height

j
on the ith year of the sample sequence. Let

is the cardinality of set J.

Let S denoted the mean sum of squares as follows

1
j

y i.
i j

(4.2.3)
^ and
We are to obtain the LS-estimators, say

as functions of the data. Theyare to be

found from solving the simultaneous equations by setting partial derivatives equal to zero. i.e
S

=0

and

=0

j 1,2,3,..., n
First assume that

where j is the set of independent observations and recall the

formulae for the sum of integers and their squares.

i
i 1

n n 1
2

i
and

i 1

n n 1 2n 1
6
(4.2.4)

Now we have

And

S 2 n

yi
n i 1

(4.2.5)

S
n 1

2 y

(4.2.6)

Where we define

1 n
y yi
n i1

y*
and for later use

2
n n 1

i. y
i 1

And thus the two equations solved simultaneously for and we get

6
y* y
$

n 1

(4.2.7)

And
4n 2 y 3 n 1 y *

n 1
n 1

n 1
2

2n 1

and y *

where
These estimates can also be written in more convenient form. This is more useful for numerical
calculations.

3
Xk
2 kX k n 1 X k

n
n n 1
(4.2.8)

6
2 kX k n 1 X k
n n 1 n 1
(4.2.9)

Where k = 1,2,3 . . .n

,and n is total number of observations

Yi i1

v
E

If Y~F with E[Y]= V where

are identically and independently distributed then

E $ 0
and

this shows that these estimators give the true answer in expectations. i.e they are

unbiased when there is no true increase.


Also show if there is no true increase. If there is annual average increase per year,

E Y v ik
namely,

then

v k n 1
E

^ and

Once Least-Squares estimates

are computed, we can estimate the worst average

rainfall and flood peaks after ten years by using the formulae
^ + (10 + n) ^
(4.2.10)
Now one might ask "What would the worst rainfall look like?" To answer this question, we could
then compute the reduced values
Zk=

X k k

(4.2.11)

And find the


n

Zmax= maxk =1 Zk

(4.2.12)

So the worst rainfall over the next ten years estimated as

10 n $
Z max
(4.2.13)

4.3

Missing data analysis

The results of any data are mostly based on the availability and accuracy of the data.
Unfortunately, the missing observations are the real problem for every researcher. Now we will
discuss how one can tackle the issue of missing values.
J 1, 2,3, . .. n

we have

Recall that when j cardinality


S

1
j

y i.

i j

(4.3.1)
And so
S
2

y i. and

i j

S
2

y i. i

i j

(4.3.2)
Now we equate these both to zero and with some acknowledged ambiguity we

also now denote


1
j

y
i j

and y*

1
iy
( j ) i j i

(4.3.3)

1
j

and

i j

1
i2
j i j

(4.3.4)
The two equations determining the estimators are the solutions to the pair
y a

and

y* a b

Which are
y* a y
$
b a2

and

b y ay

b a2

(4.3.5)

It should also be demonstrated that no error is made by writing the ith year instead of the
calendar year say m + i where perhaps m = 1997. But if there is no data for two years then the
next coded entry would be the appropriate 1 value plus 2.

4.4

Analysis of Mangla Flood Peaks

The Historigramof the Mangla flood peaks data pertaining the 89 years from 1925 to 2013
indicates decreasing line with very small value of r 2=0.011 in fig (4.4.1). The value of R-square
of the least square line is closer to zero which indicates that the line is almost horizontal. That
means there is neither an upward nor a downward secular at Mangla.
Fig.(4.4.1) The Historigram of the flood peaks at Mangla site

Trend Analysis Plot for Mangla


Linear Trend Model
Yt = 271526 - 829.501* t
1200000

Variable
Actual
Fits

1000000

Accuracy Measures
MAPE
8.46726E+01
MAD
1.36120E+05
MSD
4.27873E+10

Mangla

800000
600000
400000
200000
0
1

18

27

36

45
54
Index

63

72

81

Z k X k 271525.5 829.500511 k
The estimates

and

are 271525.5and -829.500511 respectively

The maximum value calculated by using the maximum of Zk is 474880.5 corresponding to the

10 n $
Z max
year 1992.The worst flood peaks after some years may also be estimated as
Table (4.5.1)
Years
2015
2016
2017

Estimated flood peaks


670921.4535
670091.953
670067.937

The above table contains the forecasting of the flood height in coming years.

4.6

Analysis of site Shahdara Flood Peaks

The Historigram of the Shahdara flood peaks data pertaining the 88 years from 1925 to 2012
indicates slightly decreasing line with very small value of r 2=0.008. The value of R-square of the
least square line is closer to zero which indicates that the line is almost horizontal. That means
there is neither an upward nor a downward secular at Shahdara.
The Historigram of the flood peaks at Shahdara site

Trend Analysis Plot for Shahdara


Linear Trend Model
Yt = 102515 - 299.295* t
600000

Variable
Actual
Fits

500000

Accuracy Measures
MAPE
70
MAD
51600
MSD
7494464103

Shahdara

400000
300000
200000
100000
0
1

18

27

36

45
Index

54

63

72

81

Fig. (4.6.1)

Z k X k 102514.7 300.04 k
The estimates

and

are 102514.7and -300.04 respectively

The maximum value calculated by using the maximum of Zk is 492685.3 corresponding to the
year 1988.The worst flood peaks after some years may also be estimated as

10 n $
Z max
Table (4.6.1) Forecasting of worst flood peaks
Years
Estimated flood peaks
2015
567900
2016
567600
2017
567300
The table (4.6.1) has the forecasting flood height of three years at shahdara.

4.7

Analysis of site Balakot Rainfall Station

The Historigram of the Balakot rainfall data pertaining the 36 years from 1977 to 2012 indicates
increasing line with value of r2 =0.28. The amount of rainfall may increase in coming years and
caused diverse floods. There may be some other reasons along with global warming. The
forecasting of worst rainfall can play a vital role for decision making and hydrological
engineering. It provides the basis for developing the design values for rainfall and flood
protection buildings (dikes).

Trend Analysis Plot for Balakot


Linear Trend Model
Yt = 377.8 + 2.03* t
700

Variable
Actual
Fits

600

Accuracy Measures
MAPE
20.4
MAD
73.3
MSD
10481.3

Balakot

500
400
300
200
100
4

12

16
20
Index

24

28

32

36

Fig. (4.7.1)The Historigram of the rainfall data of Balakot Site

Z k X k 378.2527 2.223k
The estimates
^

and

are 378.2527 and 2.223 respectivelyThe estimates

and

are 378.2527 and 2.223 respectively

The maximum value calculated by using the maximum of Z k is 319.81 corresponding to the year
2010
Table (4.7.1) Forecasting of worst rainfall
Years
2015
2016
2017

4.8

Estimated rainfall
784.75
786.98
789.44

Fitting parabolic trend to Flood and Rainfall data

Let the series Xt considered independent time series based on time. And the series contains the
parabolic trend. Then we have the parabolic trend

E Xk k k2
(4.8.1)
n

S= X k k k
k=1

And

(4.8.2)

S
0

S
0

and

S
0

To find the estimator we get


n
S
=2 X k k k 2 1
k=1

and setting it equal to zero we get


n

n n 1 2n 1
6

i 1

X
k 1

kX
k 1

n 1 n 1 2n 1
2

k 1

k 1

(4.8.3)

k k k3 0
2

k 1

(4.8.4)

k 1

k 1

k 1

k 1

k2X k k2 k3 k4 0

4.9

(4.8.5)

Measures of accuracy for time series data

The forecasting of the time series data is based on the selection of appropriate model. We used
some techniques to measure the accuracy of the model, which are useful to compare and
forecasting of the different fits to the sampled data. The three methods of measuring the accuracy
of the specified models are:
Mean Absolute Percentage Error (MAPE)
Mean Absolute Deviation (MAD)
Mean Squared Deviation (MSD)
The outliers have a significant effect on these approaches e.g The MAD is slightly affected by
outliers as compared to the MSD. Generally the least value among all three methods is
considered as a good model.
4.10

Risalpur Rainfall Site

A visual investigation of the data suggests the quadratic model that could be very useful to
explain the presence of trend in observed data. The value of R square (only 11.9%) variation
explained that the rainfall has very small changes with passage of time. The estimated values of
the quadratic model is as follows

Z k X k 401.2 10.68k 0.327k 2

R-Sq = 11.9%
The R Square statistic is a measure of the strength of association between the observed and
model-predicted values of the dependent variable. The large R Square values indicate strong
relationships for both models. The R Square for the Quadratic model is larger, though it is not
clear whether this is due to the Quadratic model capitalizing on chance with an extra parameter .
The R square value
The scatter plot of Risalpur site shows the parabolic trend in the observed data. There are shown
some outliers in the data.
A visual clue from the figure 1 indicates that there are some outliers present in the data. In order
to obtain the more precise examination one can detect the reasons of these outliers.

Table (4.10.1) Measures of accuracy

MAPE
MAD
MSD

Linear Method
24.4
78.0
10543.1

Table (4.10.2) Scatter plot and residuals

Trend Methods
Quadratic
23.19
72.97
9547.50

Exponential
23.7
75.4
10740.8

Fitted Line Plot of Risalpur

Residuals Versus Fits of Risalpur

700

S
R-Sq
R-Sq(adj)

300

101.733
11.9%
6.5%

600

200

Residual

Risalpur

500
400

100

300
-100
200
-200
100
0

(a)

10

20
k_1

30

40

300

Scatter plot of Quadratic model

320

(b)

340

360

380
400
Fitted Value

420

440

460

Residuals versus fitted values

The estimated amount of rainfall by quadratic model provides the best prediction as compared to
the other methods. From table (4.10.2), one can see the difference between the estimated values
by different methods for the same year which clearly shows that the misspecification of the
model will mislead the results.

Table (4.10.2)
Years

Forecasting
Linear
Quadratic
Exponential
2013
377.408
454.100
359.241
2014
378.838
467.966
360.518
2015
380.269
482.488
361.800
2016
381.699
497.664
363.086
2017
383.130
513.494
364.377
A huge difference is observed in forecasting with selected models. Large amount of rainfall is
expected according to second degree curve.

Figure (4.10.2) Graphical comparison three models

Trend Analysis Plot for Risalpur

Trend Analysis Plot for Risalpur


Quadratic Trend Model

Linear Trend Model


700

700

Variable
Actual
Fits
Forecasts

600

600
500
Risalpur

500
Risalpur

Variable
Actual
Fits
Forecasts

400

400

300

300

200

200

100

100
4

12

16

20 24
Index

28

32

36

40

12

16

20 24 28
Index

32

36

40

Trend Analysis Plot for Risalpur


Growth Curve Model
700

Variable
Actual
Fits
Forecasts

600

Risalpur

500
400
300
200
100
4

12

16

20
24
Index

28

32

36

40

The quadratic model gives the least results among all three methods so the quadratic degree is a
better choice for the forecasting of the rainfall in preceding years.
4.11

Dir Rainfall

The rainfall data at Dir site exhibit that the trend is quadratic. The figure (4.11.1) contains the
fitted plot of the Dir site and the residuals against the fitted. The value of R square is also very
small i.e 18 %.
(4.11.1)

Fitted Line Plot of Dir

Residuals Versus Fits of Dir


S
R-Sq
R-Sq(adj)

220

75

25.4476
18.5%
13.5%

200

50

Residual

Dir

180
160

25

140
-25

120
100

-50
0

10

20
k_1

30

40

120

(a) Scatter plot

(b)

130

140
Fitted Value

150

160

Residuals plot

The estimated model is as follows.

Z k X k 120.9 0.67k 0.0157k 2


R-Sq = 18.5%
Table (4.11.1)
Measures

of

Trend Methods
Quadratic

Linear Method
Exponential
Accuracy
MAPE
11.689
11.594
11.327
MAD
16.964
16.853
16.695
MSD
560.513
558.224
563.690
The two values of accuracy measures including MAPE and MAD of the exponential model are
larger in amount to the corresponding models. From the figure (4.11.1 a) one can perceive that
the presence of outliers can change the true image of the modele.
Table (4.11.2) forecasting of worst rainfall
Years

Forecasting
Linear
2013
163.282
2014
164.528
2015
165.774
2016
167.020
2017
168.265
Graph of three models

Quadratic
166.959
168.801
170.675
172.580
174.516

Exponential
160.698
162.030
163.374
164.728
166.093

From the accuracy measures we can see that the Exponential model receives the smaller values
of MAPE and MAD while the quadratic model has the minimum value of MSD.
4.12

Kohat Rainfall

The Kohat site has a parabolic trend. The visual investigation of the graph suggests the presence
of outliers in the data. The estimated model is as follows.

Z k X k 134.3 1.04k (0.0218)k 2


Scatter plot of Kohat rainfall site

Fitted Line Plot of kohat

Residuals Versus Fits of Kohat

220

S
R-Sq
R-Sq(adj)

22.4230
4.2%
0.0%

60

200
40
Residual

Kohat

180

160

20

140
-20
120
-40
0

10

20
k_1

30

40

135

140
Fitted Value

145

150

Figure (4.12.1)
From the fig(4.12.1) the graph shows that there is no linear trend in the data. By using the
measurement of accuracy we can observe that the quadratic model gives the best fit for kohat site
of rainfall.
Table (4.12.1)
Measures

of

Linear Method
Accuracy
MAPE
12.006
MAD
17.475
MSD
474.968
Table (4.12.2) Forecasting of rainfall

Trend Methods
Quadratic

Forecasting
Linear
Quadratic
2013
148.110
143.005
2014
148.346
142.413
2015
148.582
141.777
2016
148.818
141.098
2017
149.054
140.376
Graphical Comparison of three models

11.960
17.368
470.556

Exponential
11.674
17.183
477.505

Years

Exponential
146.023
146.233
146.444
146.655
146.866

Trend Analysis Plot for Kohat

Trend Analysis Plot for Kohat

Linear Trend Model

Quadratic Trend Model

220

220

Variable
Actual
Fits
Forecasts

200

200

180
Kohat

180
Kohat

Variable
Actual
Fits
Forecasts

160

160

140

140

120

120
4

12

16

20
24
Index

28

32

36

40

12

16

20 24
Index

28

32

36

40

Trend Analysis Plot for Kohat


Growth Curve Model
220

Variable
Actual
Fits
Forecasts

200

Kohat

180
160

140
120
4

12

16

20
24
Index

28

32

36

40

Figure (4.12.2)
The exponential model receives the smaller accuracy values of MAPE and MAD as compared to
other models. We can also observe from the scatter plot Figure (4.12.1) that the pattern is not
followed linear of quadratic trend. The best choice for this data is the exponential model.
4.13

Marala Rainfall Site

The Marala flood site has a parabolic trend. The visual investigation of the graph suggests the
presence of outliers in the data. The estimated model is as follows.

Z k X k 246051 6423k ( 75.15) k 2

R-Sq = 4.4%

Fitted Line Plot of Marala

Residuals Versus Fits of Marala

1200000

S
R-Sq
R-Sq(adj)

1000000

800000

207623
4.4%
2.2%

600000
400000
Residual

Marala

800000
600000

200000

400000

200000

-200000

-400000
0

10

20

30

40

50

60

70

80

90

250000

300000
Fitted Value

Table(4.13.1)
Measures

of
Linear Method

Accuracy
MAPE
56.9958
MAD
159975
MSD
4.35181E+10
Table (4.13.2) Forecasting of worst flood
Years
2013
2014
2015
2016
2017

Linear
322730
322465
322199
321933
321668

Trend Methods
Quadratic
55.3978
156370
4.16378E+10

Forecasting
Quadratic
222405
215376
208196
200867
193387

Exponential
56.9169
157535
4.59399E+10

Exponential
266671
266263
265855
265448
265041

350000

400000

Trend Analysis Plot for Marala

Trend Analysis Plot for Marala

Linear Trend Model

Quadratic Trend Model

1200000

1200000

Variable
Actual
Fits
Forecasts

1000000

1000000
800000
Marala

800000
Marala

Variable
Actual
Fits
Forecasts

600000

600000

400000

400000

200000

200000

0
1

18

27

36

45 54
Index

63

72

81

90

18

27

36

45 54
Index

63

72

81

90

Trend Analysis Plot for Marala


Growth Curve Model
1200000

Variable
Actual
Fits
Forecasts

1000000

Marala

800000
600000
400000
200000
0
1

18

27

36

45
54
Index

63

72

81

90

From the table (4.13.1) we can see that the quadratic model has the smaller values of MAPE,
MAD and MSD. By using these measures of accuracy the best model is the quadratic model
4.14

D.G khan Site

The estimated linear and quadratic models are as follows

Z k X k 10.92 2.17 k (0.0713) k 2

Line Plot of D.G.Khan

Residuals Versus Fits of D.G.Khan

50

S
R-Sq
R-Sq(adj)

9.78139
16.3%
8.4%

20

10
Residual

D.G.khan

40

30

20

-10

10

-20
0

10

15

20

25

12

k_1_1

14

16

18

20
22
Fitted Value

Figure(4.14.1)
Table(4.14.1)
Measures

of

Accuracy
MAPE
MAD
MSD
Table(4.14.2)
Years
2012
2013
2014
2015
2016

Forecasting
Linear
28.2392
28.6230
29.0069
29.3908
29.7747

Linear Method

Trend Methods
Quadratic

36.1637
7.6410
92.9969

33.3424
7.2195
83.7162

Quadratic
20.5184
19.0494
17.4377
15.6836
13.7869

Exponential
32.4880
7.4172
96.9935

Exponential
26.8601
27.3407
27.8298
28.3277
28.8345

24

26

28

Trend Analysis Plot for D.G.khan

Trend Analysis Plot for D.G.khan

Linear Trend Model

Quadratic Trend Model

50

50

Variable
Actual
Fits
Forecasts

40
D.G.khan

D.G.khan

40

Variable
Actual
Fits
Forecasts

30

20

30

20

10

10
3

12

15
18
Index

21

24

27

12

15
18
Index

21

24

27

Trend Analysis Plot for D.G.khan


Growth Curve Model
50

Variable
Actual
Fits
Forecasts

D.G.khan

40

30

20

10
3

12

15
Index

18

21

24

27

Figure (4.14.2)
The quadratic model is better fit for the forecasting of D.G.Khan rainfall site.
4.15

Terbela Flood Site

The quadratic model is seemed to be a good model for the Terbela flood site. There is an
indication of outliers in the data which is needed to be investigated and make a better prediction
for the future flood amounts. The R square value is about 60 % shows a huge amount of variation
is accounted. The estimated model is

Z k X k 363419 2108k (40.8) k 2

R-Sq = 0.6%

residual vs fits of terbela

Fitted Line Plot of terbela


900000

S
R-Sq
R-Sq(adj)

800000

500000

100191
1.0%
0.0%

400000
300000
Residual

Terbela

700000
600000
500000

200000
100000

400000
0
300000
-100000
200000
0

10

20
k_1

30

40

360000

365000

370000

375000 380000
Fitted Value

385000

Figure(4.15.1)
Table(4.15.1)
Measures
Accuracy
MAPE
MAD
MSD
Table(4.15.2)
Years
2013
2014
2015
2016
2017

of Trend Methods
Linear Method
16
63210
9254221795

Forecasting
Linear
395152
395752
396351
396951
397550

Quadratic

Exponential

16
62645
9238780213

18
63444
9386600743

Quadratic
385601
384652
383621
382509
381315

Exponential
373603
373533
373463
373394
373324

390000

395000

Trend Analysis Plot for Terbela

Trend Analysis Plot for Terbela

Linear Trend Model

Quadratic Trend Model

900000
800000
700000

700000

600000

600000

500000

500000

400000

400000

300000

300000

200000

Variable
Actual
Fits
Forecasts

800000

Terbela

Terbela

900000

Variable
Actual
Fits
Forecasts

200000
4

12

16

20 24
Index

28

32

36

40

12

16

20 24
Index

28

32

36

40

Trend Analysis Plot for Terbela


Growth Curve Model
900000

Variable
Actual
Fits
Forecasts

800000

Terbela

700000
600000
500000
400000
300000
200000
4

4.16

12

16

20
24
Index

28

32

36

40

Muzafarabad

The linear model is seemed to be a good model for the Muzafarabad rainfall site. There is an
indication of outliers in the data which is needed to be investigated and make a better prediction
for the future rainfall amounts. The R square value is about 70 % shows a huge amount of
variation is accounted. The estimated model is

$k $k 2
Zk X k
Z k X k 129.562 0.053 k 0.004k 2
R-Square= 70.2%

Fitted Line Plot of Muzafarabad

Residuals Versus Fits of Muzafarabad


S
R-Sq
R-Sq(adj)

180

22.0272
0.7%
0.0%

50

160

140

Residual

Muzafarabad

25

120

-25

100

80

-50
0

10

20

30
k

40

50

60

123

124

125

126
127
Fitted Value

128

129

130

Table(4.16.1)
Measures

of Trend Methods
Linear Method

Accuracy
MAPE
MAD
MSD

14.434
17.705
459.259

Quadratic

Exponential

14.450
17.726
459.205

14.476
17.841
462.689

Figure (4.16.2)
Graphical comparison of three models
Trend Analysis Plot for Muzafarabad

Trend Analysis Plot for Muzafarabad

Linear Trend Model

Quadratic Trend Model


Variable
Actual
Fits
Forecasts

180

160
Muzafarabad

Muzafarabad

160

140

120

100

Variable
Actual
Fits
Forecasts

180

140

120

100

80

80
1

12

18

24

30 36
Index

42

48

54

60

12

18

24

30 36
Index

42

48

54

60

Trend Analysis Plot for Muzafarabad


Growth Curve Model
Variable
Actual
Fits
Forecasts

180

Muzafarabad

160

140

120

100

80
1

12

18

24

30
36
Index

42

48

54

60

A visual clue from the scatter plot of Muzafarabad the linear trend is suspected. From the
Accuracy table the linear trend is found to be good model.
Table (4.16.2) Forecasting of rainfall
Years

Forecasting
Linear
Quadratic
Exponential
2011
123.855
123.311
121.395
2012
123.745
123.144
121.265
2013
123.635
122.975
121.135
2014
123.526
122.804
121.006
2015
123.416
122.631
120.876
The above estimation methods are used as a preliminary analysis of the rainfall and flood data.
To evaluate the foretelling and warning about the floods are needed some further analysis

CHAPTER 5
ANALYSIS OF RECORD VALUES
5.1

Introduction

A record is a specific value or entry which is smaller or larger from all the previous values say

X
j

j 1,2,3,..., n

j 1

where

be the sequence of independent random variables. The value which

is largest in magnitude from all the remaining values is called an upper record value and the
value which is smallest in magnitude from the remaining all values is called lower record values.

X kj

Let

be the level of flood in the river on the k th day of jth site. If we are interested in maximum
X kj

local value of flood height of

then that local maxima known as upper record values. And the

local minima are known as lower record values.


Chandler (1952) gave the concept of the record times and record values. He found that the
expectation of inter record time is going unlimited of a random variable followed any probability
distribution. Feller (1966) quoted different examples related to the gambling problems which are
based on record values
There are many real lives practical examples of Record values as well as in statistical situations
like economics, sports, weather etc. some time we are interested for seeking new records and
maintain them for further analysis and comparisons. For example, Olympic records, world
records in sports, records of earthquakes, record of rainfall and highest flood peaks over the
years etc.
5.2

Probability Density Function of Upper Record Values

Let x1,x2,x3 . . .xnare the identical and independent distributed random variables

from any

distribution having probability density function f(x) and probability distribution function F(x)
with a specified random sample size.
X U (1) , X U (2) , X U (3) ,..., X U ( r )

If

are the upper record values then the probability density function of

the upper record values is given as below;


XU (r)

The probability density function of the


f r ( x)

is

1
[ R x ]r 1 f ( x )
- x
r

(5.2.1)
Where the reliability function is as follows:

1 F ( x )

R x ln

r ( x)

d
f(x)
R(x) =
dx
1-F(x )

and

And the joint probability distribution of r upper record values written as follows:

r 1

f ( xU (i ) )

i 1

1 F ( xU (i ) )

f1,2,...,r ( xU (1) , xU (2) ,..., xU ( r ) ) f ( xU ( r ) )

(5.2.2)

XU (r)

The joint probability density function of first r upper values

and s upper record values

X U ( s)

is as follows;

f r , s ( x, y )

1
[ R x ]r 1[ R y R x ]s r 1 r ( x ) f ( y )
r sr

(5.2.3)

- y x
where r<s and
5.3

Probability Density Function of Lower Record Values

Let x1,x2,x3 . . .xnare the identical and independent distributed random variables

from any

distribution having probability density function f(x) and probability distribution function F(x)
with a specified random sample size.
X L (1) , X L (2) , X L (3) ,..., X L ( r )

If

are the lower record values then the probability density function of

the lower record values is given as below;


X L(r )

The pdf of the

is
f r ( x)

1
[ H x ]r 1 f ( x)
- x
r

(5.3.1)
h( x )

H x ln[ F ( x )]
Where

d
f(x)
H (x) =
dx
F(x)

and
X L (1) , X L (2) , X L (3) ,..., X L ( r )

The joint density function of r lower record values


r 1

f1,2,..., r ( xL (1) , xL (2) ,..., xL ( r ) ) f ( xL ( r ) )


i 1

is given as

f ( xL ( i ) )
1 F ( xL ( i ) )

(5.3.2)

X L ( r ) and X L ( s )

The joint pdf of

f r , s ( x, y )

is

1
[ H x ]r 1[ H y H x ]s r 1 h( x) f ( y )
r sr
y x
(5.3.3)

5.4

Properties of Record Values:

Let X1,X2,X3,Xn are the independent and identically distributed from any distribution having
the probability density function f(x) and cumulative distribution function F(x).
The properties of record values are as follows:
X U (1) , X U (2) , X U (3) ,..., X U ( r )

If

be the upper record values then:


X U (1) , X U (2) , X U (3) ,..., X U ( r )

(i)

The joint density function of r upper record values

f1,2,...,r ( xU (1) , xU (2) ,..., xU ( r ) ) f xU ( r )

r 1

f (x
i 1

) 1 F ( xU ( i ) )

is given
1

U (i )

(5.4.1)

X U r
(ii)

The pdf of the


f r ( x)

is

1
[ R x ]r 1 f ( x)
- x
r

(5.4.2)
r ( x)

R x ln[1 F ( x )]
Where

and

d
R(x) = f(x)[1-F(x)]-1
dx

X U r and X U s
(iii)

The joint pdf of


f r ,s ( x , y )

is

1
[ R x ]r 1[ R y R x ]s r 1 r ( x ) f ( y )
r sr

- x y

rs
and

(iv)

The nth Moment of upper record values can be calculated as

(5.4.3)

n
(r)

1
=
x n R x

r 1

f(x) dx

(5.4.4)

Product moment of the nth and mth upper record values is as follows

(v)

n,m

r,s

= E(X nU(r) X mU(s) )


(5.4.5)

(vi)

The Variance and covariance of upper record values

V(X U(r) ) = 2r- ( )1r

Cov(X mU(r) X nU(s) ) =

1,1
(r),(s)

- 1(r)

1
(s)

,
5.5

(5.4.6)

Properties for Lower record values


X L (1) , X L (2) , X L (3) ,..., X L ( r )

(i)

The joint density function of r lower record values

is given

as
f1,2,...,r ( xL (1) , xL (2) ,..., xL ( r ) ) f xL ( r )

r 1

f (x
i 1

) 1 F ( xL (i ) )

L(i)

(5.5.1)
X L (r )

(vii)

The probability density function of the


f r ( x)

is

1
[ H x ]r 1 f ( x)
- x
r

(5.5.2)
h( x )

H x ln[ F ( x)]
Where

and

d
1
H (x ) = f(x) F(x)
dx

X L ( r ) and X L ( s )

(viii)

The joint pdf of

f r , s ( x, y )

is

1
[ H x ]r 1[ H y H x ]s r 1 h( x) f ( y )
r sr
y x
(5.5.3)

(ix)

The nth Moment of lower record values is defined as

n
(r)
=

1
n
x H x
r

r 1

f(x ) dx
(5.5.4)

(x)

Product moment of the nth and mth lower record values is defined as


r,s

n ,m

n
m
= E(X L(r)
X L(s)
)

(5.5.5)
(xi)

Variance and covariance of lower record values is defined as

V(X L(r) ) = r2- ( )1r

Cov(X L(r) X L(s) ) =

1,1
(r),(s)

- 1(r)

1
(s)

,
5.6

(5.5.6)

Lower Record Values from IWD

X L 1 ,X L 2 ,........,X L r
Let

are the first r lower record values from the Inverse Weibull distribution

with probability density function having single parameter


f x x 1e x

x 0, 0
(5.6.1)

And the cumulative distribution function (cdf) is as follows


F x e x

(5.6.2)
The distribution of r lower record values is
f r ( x)

1
[ H x ]r 1 f ( x )
r

H ( x) x
where

Then the probability distribution function of LRV from inverse weibul is


fr x

1 r 1 x
x
e
r

(5.6.3)

h( x )

H x x
Where

and

d
(x ) = x 1
dx

The mean of LRV from inverse weibul can be found by definition of expectation of a random
variables X

x f x dx

E( X )

c r 1 x
( x ) x e dx
r0

E( X )

(5.6.4)
z x

After substituting

we get the value

r 1

E( X )

E( X

r 2

r 1

Var ( X )

r 1

(5.6.5)

(5.6.6)

The inverse weibull distribution having two parameters is as follows:

f ( x, , ) x 1e x

, 0
(5.6.7)

Where is a scale parameter and is a shape parameter


The pdf of LRV from inverse weibull distribution with two parameters

f r ( x)

r
r

x r 1e x

(5.6.8)

The mean and variance is as follows:

E ( x)

r
r

r 1 x
dx
x e
0

(5.6.9)

E ( x)

2
r 1
r 2

E(x2 )

(5.6.10)

r 1


r 2

Var ( x)

5.7

(5.6.11)

Lower Record Values from Frechet distribution

X L 1 ,X L 2 ,........,X L r
Let

are the first lower record values from the Frechet distribution with

probability density function

f ( x)

0, 0, and x 0

(5.7.1)

Where is a scale parameter is a shape parameter and is a location


parameter. And the cumulative distribution function (cdf) of Frechet distribution is as follows:
x
F ( x) exp

(5.7.2)

Then the pdf of LRV from Frechet distribution is

x
f r ( x)

r 1

x
exp

(5.7.3)

E ( x)

1
1
r

E ( x2 )

(5.7.4)

1 2
2
1
2
r 2 r r

1
2
1
1
1
Var ( x) 2 r 2 r 2 r r

r
r

5.8

(5.7.5)

(5.7.6)

Maximum Likelihood Method

The likelihood function of the first n lower record values is given by (Arnold et al.)

L f xL n

f xL i

F
n 1
i 1

Li

(5.8.1)
We have the likelihood function for inverse Weibul Distribution of lower record values with
single parameter
n

L( , x) n X L n c1 exp X L n c
i 1

(5.8.2)

And the log likelihood function is


n

LogL , x n log c 1 log xL i xLcn


i 1

(5.8.3)

Differentiate with respect to we get


LogLf ( x ) n
X L n c

After setting

LogLf ( x)
0

equal to zero we get

(5.8.4)

n
X L n c

(5.8.5)
2 LogLf ( x )
n
2
2

2 LogLf ( x )
0
2

(5.8.6)

The variance of the ML estimator by using Rao Cramer Lower Bound we have

var $
n

(5.8.7)

(Tayyab et al) proposed the condition for unbiased estimator of MLE

n 1 x c L n
(5.8.8)

which gives an unbiased estimator

of MLE

E
that is

The MLE of Frechet distribution

x
f ( x)

The likelihood function of Frechet distribution is

n n x
Lf ( x) n i
i 1


i 1

(5.8.9)

and the log likelihood function is


n
x
x
LogLf ( x) n log n log 1 log




i 1
i 1
n

(5.8.10)

Partially differentiate with respect to and we get

LogLf ( x )
n
n
1
1

(5.8.11)

xL n n
LogLf ( x) n n
1
log

i1

(5.8.12)

When is known but and are unknown then the ML estimators are

n
x

n 1 log
i 1

n xL n

(5.8.13)

(5.8.14)

We purposed formulae for unbiased MLE

Z
in2
n

where Z i

X L n

(5.8.15)

E Z i

E
n n2

(5.8.16)

As we have

E Zi

H x

n 1

f x dx
(5.8.17)

E Z i

n xL n


n 0

After substituting

1n 1

xL n

xL n
t

dx

(5.8.18)

we get

E Zi n n1

(5.8.18)
Putting the value of equation (5.8.18) in equation (5.8.16) we get

E
5.9

Means and Variances

Means and Variances of LRV from inverse weibull distribution


Table (5.9.1) =3 , =2
r
1
2
3
4
5
6
7

Means
1.2599
1.1373
0.9477
0.8424
0.7722
0.7207
0.7791

Variance
1.3386
0.1238
0.0349
0.0252
0.0161
0.0113
0.0111

The table (5.9.1) contains the mean and variance of lower record values from the inverse weibull
distribution with known values of shape and scale parameters. As from table we can observe the
mean and variance are decreased as the value of r increased
Mean and Variance table of LRV from Frechet distribution.
Table (5.9.2) =3,=3,=2
r
1
2
3
4
5
6
7
8
Table(5.9.3) =5,=5,=4
r
Means
Variance
1
9.8210
3.3460
2
8.6570
0.64
9
3

Means
6.0623
4.7084
4.2569
4.0062
3.8390
3.7164
3.6211
3.5439

Variance
2.7846
0.7016
0.2643
0.1424
0.0900
0.0589
0.0470
0.0366

8.1913
0.3026
4
7.9183
0.1290
5
7.7162
0.0957
6
7.5676
0.0827
7
7.4487
0.0749
8
7.3501
0.0615
The table (5.9.2) and (5.9.3) contains the mean and variance of lower record values from the
Frechet distribution with different known values of shape, scale and location parameters. As from
table we can observe the mean and variance are decreased as the value of r increased.
Chap no 6 Stationary models

6.1 Introduction
The data collected according to the time is a problem of time series analysis. Any group of
observations which is specified in an arrangement of chronological order is called time series
data.
Hence, it is obviously noted that a large number of excellent texts on time series is available. In
which the main focus of the authors is on stationary time series and some have a good

contribution about globally non-stationary series which are used in financial time series.
Mostly the authors suggest the book of (Chatfield 2003) for the preliminary introduction about
the time series. There are many other useful books on time series by some authors i.e Priestley
(1983), Diggle (1990), Brockwell and Davis (1991), and Hamilton (1994). The book by Hannan
(1960) is concise (but concentrated) and Pole et al. (1994) is a good introduction to a Bayesian
way of doing time series analysis.
The record of data according to time can be hourly, daily, monthly, quarterly, yearly etc. For
example, record of temperature after every hour in some specific locations, weekly prices of rice
or wheat in Okara, production and consumption of monthly electricity in a certain area, monthly
or annually rainfall , yearly flood peaks at different Dams in Pakistan etc. All the mentioned
above data is related with time and treated as a time series data. Although a large number of data
is available for research in the field of social sciences. But problem is that, there is lacks in the
quality of data. We should never have ignored the fact that the results obtained from the data are
as good as the quality of data.
As we know the time series data is based on time which exhibits a natural order over time. So
there is highly a chance that the successive observations are inter-correlated. Especially, when a
short interval of time between the observations such as an hour, day, week or month instead of
years
Before discuss the further analysis, I would like to describe a beautiful quotation:
Experience with real-world data, however, soon convinces one that both stationarity and
Gaussianity is fairy tales invented for the amusement of undergraduates.
(Thomson 1994)
Keeping it in mind we observed in literature, the stationary models provide the basis for a great
portion of time series analysis.
Time series forecasting and modeling became most popular now a days in economic and
environmental topics. George Box and Gwilym Jenkins in 1976 used the form as ARIMA (p,d,q)
to describe a large class of models which could describe the behavior of many observed time
series. Where the d indicates the maximum number of times the series are required to be
differenced to make a time series stationary.

6.2

Tests for the Detection of Stationarity

In practice, there are two important questions to be faced for any researcher. (i) how can one find
that the required time series is stationary, (ii) how can we made a time series stationary if it
found a lack of stationary. Although a variety of techniques are available for the detection of
Stationarity. Examination of the variation in mean and variance with respect to time of a time
series gives a visual clue for this purpose is examination in the change of mean and variance
(constant mean and variance suggest the Stationarity in the data).
6.3

Unit Root Test

The stationary time series provides a basic for inferential analysis of time series data. The unit
root test is most popular and widely used for this purpose from last several decades in the
literature. It is based on a random walk model as followed

X t X t 1 t

1 1

1
From above model if

then we will face a unit root problem a problem of non stationary.

X t X t 1 t
(6.3.1)

X t X t 1 X t 1 X t 1 t
(6.3.2)

X t 1 X t 1 t
(6.3.3)

X t X t 1 t
(6.3.4)
Where,

is the first-difference operator. As we can see from the equation 3 if the =1 then the

1
given time series is non stationary if
null hypothesis that H0 :

it implies that the

1
if

when=1 the equation 3 will be like this

so we will test the

X t ( X t X t 1 ) t
6.4

Dickey and fuller Test

Dickey and fuller have 1979 overcome the problem occurred in unit root test. They developed a

(tau )

X t 1
statistic to estimate the value of coefficient of

by using Monte Carlo simulation they

proposed three version of the regression are used for the test as follows

Yt Yt 1 t
1.

(6.4.1)

yt B1 Yt 1 t
2.

yt 1 2t Yt 1 t
3.

(6.4.2)
(6.4.3)

With drift around a stochastic trend in all above model the authors assumed that the random error
term is uncorrelated. Augmented Dickey fuller test dickey and fuller (1979) developed ADF
test for unit root. This is most appropriate when the error terms are correlated.
The model of the test is
m

Yt 1 2t Yt 1 Yt 1 t
1

Where = first order difference operator

1
Constant / intercept

2
Coefficient on time trend
The ADF test is performed in excel as followed

H0 : 0
(Process has a unit root i.e non-stationary)

H1 : 0
(Process has not a unit root i.e stationary)

(6.4.4)

And the test statistic used


H0 :Xk has a unit root (Non Stationary process)
H1 :Xkhas not a unit root(Stationary process )
6.5

Philips Perron (PP) Unit Root Test

The Dickey and fuller Test has a necessary assumption about the random error term that should
be independently and identically distributed. The Augmented Dickey Fuller (ADF) test designed
to overcome the problem of serial correlation in random error term. The Phillips and Perreon
(1988) suggested a number of unit root nonparametric test with irrespective of lagged difference
terms in serially correlated errorterm. Thats why the distribution of PP test is asymptotically
similar to the ADF test. But it is differ from ADF test in the context of serial correlation of and
the hetroscedasticity of random error term.
It also has test the null and alternative hypothesis as follows:
H0 : The data has a unit root (Non Stationary process)
H1 : The data has not a unit root(Stationary process )
6.4 Analysis of the stationary time series data at different sites
In this section we will use the mentioned above test to see whether the data hold the assumption
of stationary or not. For this purpose, the data from all sites analyzed in a sequence and the data
found to be stationary will be proceed in the section
6.4.1

Marala Site Flood Peaks

From the following table (a) we see that the Augmented Dickey Fuller unit root test statistic
gives the

values -2.0725 which is smaller than the critical values at different level of

significance (i.e 5%, 10% etc) so we can reject H0 and conclude that the flood peaks at Marala
do not have a unit root. i.e it is stationary. PP test also reject the null hypothesis and give the
same results.
Table (a)

p value

Statistic

0.03983

-2.0725324

8
0.03983

-2.0725324

critical value

Alpha

-1.9694528

5%

-1.6336516

10%

Autocorrelation function is another important tool for diagnose the stationarity of the series. It
refers whether the time series going anywhere it should have the constant value all over the time
Abbas Keshvani (2013).
Figure 1
Autocorrelation Function for Marala

Partial Autocorrelation Function for Marala


(with 5% significance limits for the partial autocorrelations)

1.0

1.0

0.8

0.8

0.6

0.6

Partial Autocorrelation

Autocorrelation

(with 5% significance limits for the autocorrelations)

0.4
0.2
0.0
-0.2
-0.4
-0.6

0.4
0.2
0.0
-0.2
-0.4
-0.6

-0.8

-0.8

-1.0

-1.0
2

10

12

14

16

18

Lag

Graph of ACF for Marala Flood peaks

20

22

10

12

14

16

18

20

22

Lag

(b) Graph of PACF for Marala Flood peaks

As we see in above graph fig (a) ACF and fig (b) PACF are slightly exponential decay and
remains within the significant range which is an initiative indication of stationary series. In
theoretically the ACF should be close to zero because the process is purely random walk. So the
ACF and PACF suggest that the Marala flood peaks have a stationary series.
Figure 2

Residual Plots for Marala


Normal Probability Plot

Versus Fits

99.9

900000

90

Residual

Percent

99

50
10

600000
300000
0

1
0.1

-800000

-400000

400000

800000

325000

330000

Residual

335000

340000

345000

Fitted Value

Histogram

Versus Order
900000

Residual

Frequency

30
20
10
0

300000
0

-200000

200000 400000 600000 800000

Residual

6.4.2

600000

10

20

30

40

50

60

70

80

Observation Order

TARBELA SITE

From the following table (b) we see that the Augmented Dickey Fuller unit root test statistic
gives the

values -0.9024 which is greater than the critical values at different level of

significance (i.e1%,2%,5%, 10% etc) so we cannot reject H 0 and conclude that the flood peaks
at Tarbela have a unit root. i.e it is nonstationary.
Table (b)
p value

Statistic

0.329485

-0.9024

Figure 3

critical

Alpha

value
-2.72488
-2.43209
-2.01518
-1.66269

1%
2%
5%
10%

Partial Autocorrelation Function for Tarbela


(with 5% significance limits for the partial autocorrelations)

1.0

1.0

0.8

0.8

0.6

0.6

Partial Autocorrelation

Autocorrelation

Autocorrelation Function for Tarbela


(with 5% significance limits for the autocorrelations)

0.4
0.2
0.0
-0.2
-0.4
-0.6

0.4
0.2
0.0
-0.2
-0.4
-0.6

-0.8

-0.8

-1.0

-1.0
1

Lag

(a) Graph of ACF for Tarbela Flood peaks

(b) Graph of PACF for Tarbela Flood peaks

As we see in above graph fig (a) ACF and fig (b) PACF are departure from exponential decay.
Even though it remains within the significant rang but behave alike a huge variation at different
number of lags. In theoretically the ACF should be close to zero because the process is purely
random walk. So the ACF and PACF suggest that the Tarbela flood peaks have a non-stationary
series.

Residual Plots for Tarbela


Normal Probability Plot

Versus Fits

99
400000

Residual

Percent

90
50

-200000

200000

400000

375000

385000

390000

Fitted Value

Histogram

Versus Order

395000

400000

Residual

Frequency

380000

Residual

12
9
6

200000

3
0

200000

10
1

-120000

120000

Lag

240000

Residual

360000

10

15

20

25

Observation Order

30

35

6.4.3

SHAHDARA SITE

From the following table (c) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -3.030007 which is smaller than the critical values at different level of
significance (i.e1%,2%,5%, 10% etc) so we can reject H 0 and conclude that the flood peaks at
Shadara do not have a unit root. i.e it is stationary.
Table (6.4.3 a)
p value
0.003859

Statistic

critical

Alph

-3.030007

value
-2.628747

a
1%

-2.360146
-1.969453
-1.633652

2%
5%
10%

Figure (4.6.3 b)
Partial Autocorrelation Function for Shahdara

Autocorrelation Function for Shahdara


(with 5% significance limits for the autocorrelations)

1.0

1.0

0.8

0.8

0.6

0.6

0.4

0.4

Autocorrelation

Partial Autocorrelation

(with 5% significance limits for the partial autocorrelations)

0.2
0.0
-0.2
-0.4

0.2
0.0
-0.2
-0.4

-0.6

-0.6

-0.8

-0.8

-1.0

-1.0
2

10

12

14

16

18

20

22

Lag

(a) Graph of ACF for Tarbela Flood peaks

10

12

14

16

18

20

22

Lag

(b) Graph of PACF for Tarbela Flood peaks

As we see in above graph fig (a) ACF and fig (b) PACF are slightly exponential decay and
remains within the significant range which is an initiative indication of stationary series. In
theoretically the ACF should be close to zero because the process is purely random walk. So the
ACF and PACF suggest that the Shahdaraflood peaks have a stationary series.
Figure(6.4.3c)

Residual Plots for Marala


Normal Probability Plot

Versus Fits

99.9

900000

90

Residual

Percent

99

50
10

600000
300000
0

1
0.1

-800000

-400000

400000

800000

325000

330000

Residual

335000

340000

345000

Fitted Value

Histogram

Versus Order
900000

Residual

Frequency

30
20
10
0

600000
300000
0

-200000

200000

400000

600000

800000

Residual

6.4.4

10

20

30

40

50

60

70

80

Observation Order

MANGLA SITE

From the following table (d) we see that the Augmented Dickey Fuller unit root test statistic
gives the

values -3.74878 which is smaller than the critical values at different level of

significance (i.e1%,2%,5%, 10% etc) so we can reject H 0 and conclude that the flood peaks at
Mangla do not have a unit root. i.e it is stationary.
Table (6.4.4a)
p value
0.001

Figure (6.4.4b)

Statistic
-3.74878

critical value

Alph

-2.62768
-2.35935
-1.96888
-1.63328

a
1%
2%
5%
10%

Partial Autocorrelation Function for Mangla


(with 5% significance limits for the partial autocorrelations)

1.0

1.0

0.8

0.8

0.6

0.6

Partial Autocorrelation

Autocorrelation

Autocorrelation Function for Mangla


(with 5% significance limits for the autocorrelations)

0.4
0.2
0.0
-0.2
-0.4
-0.6

0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8

-0.8

-1.0

-1.0
2

10

12

14

16

18

20

22

10

12

14

16

18

20

22

Lag

Lag

Graph of ACF for Tarbela Flood peaks (b) Graph of PACF for Tarbela Flood peaks
As we see in above graph fig (a) ACF and fig (b) PACF are slightly exponential decay and
remains within the significant range which is an initiative indication of stationary series. In
theoretically the ACF should be close to zero because the process is purely random walk. So the
ACF and PACF suggest that the Manglaflood peaks have a stationary series.

Residual Plots for Mangla


Normal Probability Plot

Versus Fits

99.9

1000000

90

Residual

Percent

99

50
10

1
0.1

500000

-500000

500000

1000000

200000

220000

Residual

Histogram

280000

Versus Order

30

Residual

Frequency

260000

1000000

40

20
10
0

240000

Fitted Value

500000

0
-200000

200000 400000 600000 800000

Residual

10

20

30

40

50

60

Observation Order

70

80

6.4.5

MUZAFARABAD SITE

From the following table (e) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -3.45 which is smaller than the critical values at different level of significance
(i.e 1%,2%,5%, 10% etc) so we can reject H 0 and conclude that the flood peaks at Muzafarabad
site do not have a unit root. i.e it is stationary.
Table (6.4.5a)
p value

Statistic

critical value

Alph

0.003

-3.45

-2.96
-2.4325
-1.8765
-1.4532

a
1%
2%
5%
10%

Autocorrelation Function for Muzfarabad

Partial Autocorrelation Function for Muzfarabad

(with 5% significance limits for the autocorrelations)

(with 5% significance limits for the partial autocorrelations)

1.0

1.0

0.8

0.8

0.6

0.6

Partial Autocorrelation

Autocorrelation

Figure (6.4.5b)

0.4
0.2
0.0
-0.2
-0.4
-0.6

0.4
0.2
0.0
-0.2
-0.4
-0.6

-0.8

-0.8

-1.0

-1.0
1

10

11

Lag

Graph of ACF for Muzafarabad Site


Table(6.4.5c)

12

13

14

10

Lag

(b) Graph of PACF for Muzafarabad Sits

11

12

13

14

Residual Plots for Muzafarabad


Normal Probability Plot

Versus Fits

99

50

Residual

Percent

90
50

25
0
-25

10

-50

-50

-25

0
Residual

25

50

124

Histogram

25

Residual

50

Frequency

130

Versus Order

12

6
3

0
-25
-50

6.4.6

126
128
Fitted Value

-40

-20

0
20
Residual

40

60

10 15 20 25 30 35 40 45
Observation Order

50 55

BALAKOT SITE

From the following table (f) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -1.02467 which is larger than the critical values at different level of significance
(i.e 1%,2%,5%, 10% etc) so we cannot reject H0 and conclude that the rainfall data at Balakot
Site have a unit root. i.e it is non-stationary.
Table (6.4.6a)
p value
0.35998

Statistic
-1.02467

critical

Alpha

value
-2.99243
-2.7204
-2.3129
-1.4568

1%
2%
5%
10%

Partial Autocorrelation Function for Balakot

Autocorrelation Function for Balakot

(with 5% significance limits for the partial autocorrelations)

1.0

1.0

0.8

0.8

0.6

0.6

Partial Autocorrelation

Autocorrelation

(with 5% significance limits for the autocorrelations)

0.4
0.2
0.0
-0.2
-0.4
-0.6

0.4
0.2
0.0
-0.2
-0.4
-0.6

-0.8

-0.8

-1.0

-1.0
1

5
Lag

(a) Graph of ACF for Balakot Site

(b)

5
Lag

Graph of PACF for Balakot Site

Residual Plots for Balakot


Normal Probability Plot

Versus Fits

99
200

Residual

Percent

90
50
10
1

100
0
-100
-200

-200

-100

0
Residual

100

200

380

400

Histogram

100

Residual

Frequency

200

6.4.7

440

460

Versus Order

12

6
3
0

420
Fitted Value

0
-100
-200

-200

-100

0
100
Residual

200

10
15
20
25
Observation Order

30

35

RISALPUR SITE

From the following table (g) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -1.6875 which is larger than the critical values at different level of significance
(i.e 1%,2%,5%, 10% etc) so we can reject H0 and conclude that the rainfall at Risalpur site has a
unit root. i.e it is non-stationary.

Table (6.4.7a)
p value

Statistic

0.2973

-1.6875

critical

Alpha

value
-3.9926
-3.7209
-2.3124
-1.9563

1%
2%
5%
10%

Figure(6.4.7b)
Partial Autocorrelation Function for Risalpur

Autocorrelation Function for Risalpur

(with 5% significance limits for the partial autocorrelations)

1.0

1.0

0.8

0.8

0.6

0.6

Partial Autocorrelation

Autocorrelation

(with 5% significance limits for the autocorrelations)

0.4
0.2
0.0
-0.2
-0.4
-0.6

0.2
0.0
-0.2
-0.4
-0.6

-0.8

-0.8

-1.0

-1.0
1

(a)

0.4

5
Lag

Graph of ACF for Risalpur Site

Figure (6.4.7c)

(b)

5
Lag

Graph of PACF for Risalpur Site

Residual Plots for Risalpur


Normal Probability Plot

Versus Fits

99
200
Residual

Percent

90
50

10
1
-300

-200
-150

0
Residual

150

300

330

Histogram

340

350
360
Fitted Value

370

Versus Order
200

12

Residual

Frequency

16

4
0

6.4.8

-200
-200

-100

0
100
Residual

200

300

10
15
20
25
Observation Order

30

35

KOHAT SITE

From the following table (h) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -2.6875 which is larger than the critical values at different level of significance
(i.e 1%,2%,5%, 10% etc) so we cannot reject H0 and conclude that the rainfall data at Kohat site
have a unit root. i.e it is non-stationary.
Table (6.4.8a)
p value
0.2754

Statistic
-2.6875

critical

Alpha

value
-4.7584
-4.0234
-3.3542
-2.9867

1%
2%
5%
10%

Partial Autocorrelation Function for Kohat


(with 5% significance limits for the partial autocorrelations)

1.0

1.0

0.8

0.8

0.6

0.6

Partial Autocorrelation

Autocorrelation

Autocorrelation Function for Kohat


(with 5% significance limits for the autocorrelations)

0.4
0.2
0.0
-0.2
-0.4
-0.6

0.4
0.2
0.0
-0.2
-0.4
-0.6

-0.8

-0.8

-1.0

-1.0
1

5
Lag

(a) Graph of ACF of Kohat Site

(b)

5
Lag

Graph of PACF of Kohat Site

Residual Plots for Kohat


Normal Probability Plot

Versus Fits

99
60

Residual

Percent

90
50
10
1

40
20
0
-20

-50

-25

0
25
Residual

50

140

Histogram

146

148

60

Residual

Frequency

144
Fitted Value

Versus Order

4
2

40
20
0
-20

6.4.9

142

-20

20
Residual

40

60

10
15
20
25
Observation Order

30

35

DIR SITE

From the following table (i) we see that the Augmented Dickey Fuller unit root test statistic
gives the

values -0.45634 which is larger than the critical values at different level of

significance (i.e 1%,2%,5%, 10% etc) so we can reject H0 and conclude that the rainfall data at
Dir site have a unit root. i.e it is non-stationary.

Table (6.4.9a)
p value

Statistic

0.2254

-0.45634

critical

Alpha

value
-2.7432
-1.9857
-1.4562
-0.9346

1%
2%
5%
10%

Autocorrelation Function for Dir

Partial Autocorrelation Function for Dir

(with 5% significance limits for the autocorrelations)

(with 5% significance limits for the partial autocorrelations)

1.0

1.0

0.8

0.8

0.6

0.6

Partial Autocorrelation

Autocorrelation

Figure (6.4.9b)

0.4
0.2
0.0
-0.2
-0.4
-0.6

0.4
0.2
0.0
-0.2
-0.4
-0.6

-0.8

-0.8

-1.0

-1.0
1

5
Lag

Graph of ACF for Dir Site


Figure(6.4.9c)

(b)

5
Lag

Graph of PACF for Dir Site

Residual Plots for Dir


Normal Probability Plot

Versus Fits

99

40
Residual

Percent

90
50
10

20
0
-20

1
-40

-20

0
Residual

20

40

132

Histogram

136

Versus Order
40

10.0
7.5

Residual

Frequency

134
Fitted Value

5.0
2.5

20
0
-20

0.0

-30

-20

-10

0
10
Residual

20

30

40

10
15
20
25
Observation Order

30

35

Chap No 7
Probability distribution
7.1

Introduction

In this section we will apply the selected distribution to the different sites of flood and rainfall
data. To see, the which distribution is more appropriate and good fit. For this purpose, chi square
test is used as a goodness of fit test. The method of Maximum likelihood method is used for
estimating the parameters of selected distributions
7.2

Marala Flood peaks

The following table contains the descriptive statistic summary of Marala Flood peaks.
Table 7.1
Statistic
Size
Range

Values
88
990220

Percentiles
Min
5%

Values
109780
130730

Mean
Variance
Standard

334550
44064000000
209920

deviation
Coefficient

of 0.62745

Variation
Standard error
22377
Skewness
1.5196
Excess Kurtosis
1.8119
7.2.1 Exponential Distribution

10%
25% (Q1)
50% (Median)

145650
194640
255450

75% (Q3)

411760

90%
95%
Max

721300
800040
110000

Table (7.2.1a)

2.9891E-6

Parameters
Values
Graph(7.2.1b)

Probability Density Function


0.48
0.44
0.4
0.36
0.32

f(x)

0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

x
Histogram

Exponential

700000

800000

900000

1E+6

1.1E+6

Q-Q Plot
1.1E+6

1E+6

900000

800000

Quantile (Model)

700000

600000

500000

400000

300000

200000

100000
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Exponential

(a) Histogram of exponential distribution (b) Q-Q Plot of exponential distribution


Table (7.4.2c)
Test

Test

Kolmogorov

statistics
0.28981

-Smirnov
Anderson-

9.2815

Darling
Chi-Squared

43.038

P values

Critical

Conclusion

0.0000005

value
0.14274

Reject

at

different

levels

of

2.5018

significance
Reject at

different

levels

of

12.592

significance
Reject at

different

levels

of

0.0000001
2

7.2.2

significance

Gamma Distribution

Table (7.2.2a)
Parameters
Values
Table (7.2.2b)

2.5401

1.3171E+5

Probability Density Function

Q-Q Plot
1.1E+6

0.48
1E+6

0.44

900000

0.4
0.36

800000

0.32
700000
Q
uantile(M
odel)

f(x)

0.28
600000

0.24

500000

0.2
0.16

400000

0.12
300000
0.08
200000

0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

100000
100000

200000

300000

400000

500000

600000

x
Histogram Gamma

Test

Kolmogorov

statistics
0.11674

-Smirnov
AndersonDarling
Chi-Squared

1.1E+6

Critical

Status

0.16777

value
0.14274

Reject

at

different

levels

of

2.5018

significance
Reject at

different

levels

of

12.592

significance
Reject at

different

levels

of

1.1466E-7

Normal Distribution

Table (7.2.3a)
Parameters
Values
Table(7.2.3b)

1E+6

(b) Q-Q Plot of Gamma distribution

significance
At 20% level of significance the kolmogorov-Smirnov accepted
`7.2.3

900000

P values

9.2815
43.038

800000

Gamma

(a) Histogram of Gamma distribution


Table (7.2.2c)
Test

700000

2.0992E+5

3.3455E+5

Q-Q Plot

Probability Density Function

1.1E+6

0.48

1E+6
0.44

900000

0.4

800000

0.36
0.32

700000
Quantile(Model)

f(x)

0.28

600000

0.24

500000

0.2
0.16

400000

0.12

300000

0.08

200000
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

100000
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Normal

Histogram Normal

(a) Histogram of Normal distribution


Table (7.2.3b)

(b) Q-Q Plot of Normal distribution

Test

Test

P values

Critical

Status

Kolmogorov-

statistics
0.19741

0.00176

value
0.14274

Reject at different levels of significance

Smirnov
Anderson-

5.5215

2.5018

Reject at different levels of significance

Darling
Chi-Squared
34.169
0.0000007
7.2.4 Log-Normal Distribution

9.4877

Reject at different levels of significance

Table (7.2.4a)
Parameters
Values
Table (7.2.4b)

0.5427

12.562

Probability Density Function

Q-Q Plot
1.1E+6

0.48
1E+6

0.44

900000

0.4
0.36

800000

0.32
700000
Quantile(Model)

f(x)

0.28
600000

0.24

500000

0.2
0.16

400000

0.12
300000
0.08
200000

0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

100000
100000

200000

300000

400000

500000

600000

x
Histogram

700000

800000

900000

1E+6

1.1E+6

Lognormal

Lognormal

(a) Histogram of Log Normal distribution


Table (7.2.4a)

(b) Q-Q Plot of Log Normal distribution

Test

Test

P values

Critical value

Status

Kolmogorov

statistics
0.10937

0.2261

0.14274

Accepted at different levels of

2.5018

significance
Accepted at different levels of

8.558(20%)

significance
Reject only

at

10.645(<20%

significance

and

1%,2%,5%and 10%

-Smirnov
AndersonDarling
Chi-Squared

7.2.5

1.2029
10.572

0.10253

Logistic Distribution

Table (7.2.5a)
Parameters
Values
Table (7.2.5b)

1.1573E+5

3.3455E+5

20%

level

of

accepted

at

Probability Density Function


0.48
0.44
0.4
0.36
0.32

f(x)

0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

700000

800000

900000

1E+6

1.1E+6

x
Histogram Logistic

Q-Q Plot
1.1E+6

1E+6

900000

800000

Quantile(Model)

700000

600000

500000

400000

300000

200000

100000
100000

200000

300000

400000

500000

600000

1.1E+6

x
Logistic

(a) Histogram of Logistic distribution


Table (7.2.5c)
Test

Test

Kolmogorov-

statistics
0.20718

Smirnov
Anderson-

5.4775

Darling
Chi-Squared

24.34

(b) Q-Q Plot of Logistic distribution

P values

Critical

Status

0.000862

value
.14274

Reject at different levels of significance

2.5018

Reject at different levels of significance

9.4877

Reject at different levels of significance

.0000683

7.2.6

Nakagami Distribution

Table (7.2.6a)
Parameters
Values
Table (7.2.6a)

1.5549E+11

M
0.53603
Probability Density Function

Q-Q Plot
1.1E+6

0.48
1E+6

0.44

900000

0.4
0.36

800000

0.32
700000
Q
uantile(M
odel)

f(x)

0.28
600000

0.24

500000

0.2
0.16

400000

0.12
300000
0.08
200000

0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

100000
100000

200000

300000

400000

Kolmogorov-

statistics
0.2089

Smirnov
Anderson-

4.8927

Darling
Chi-Squared
27.636
7.2.7 Weibull Distribution

800000

900000

1E+6

1.1E+6

(b) Q-Q Plot of Nakagami distribution

P values

Critical

Status

0.007578

value
0.14274

Reject at different levels of significance

2.5018

Reject at different levels of significance

12.592

Reject at different levels of significance

0.0011

Table (7.2.7a)
Parameters
Values
Table (7.2.7b)

700000

Nakagami

(a) Histogram of Nakagami distribution


Table (7.2.6c)
Test

600000

Histogram Nakagami

Test

500000

2.1248

3.6554E+5

Probability Density Function

Q-Q Plot
1.1E+6

0.48
1E+6

0.44

900000

0.4
0.36

800000

0.32
700000
Quantile(Model)

f(x)

0.28
600000

0.24

500000

0.2
0.16

400000

0.12
300000
0.08
200000

0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

100000
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

Histogram Weibull

Weibull

(a) Histogram of Weibull distribution


Table (7.2.7c)

(b) Q-Q Plot of Weibull distribution

Test

Test

P values

Critical value

Status

Kolmogorov-

statistics
0.1608

0.01867

0.17126(1%)

Only accepted at 1% levels of

Smirnov

0.15961(2%

significance

Anderson-

)
2.5018

2%,5% and 10% etc.


Reject at different levels of

15.086(1%)

significance
Only accepted at 1% levels of

11.388(2%)

significance

Darling
Chi-Squared

4.1265
14.914

0.01074

and

and

2%,5% and 10% etc.


7.2.8

Inverse Gaussian distribution

Table (7.2.8a)
Parameters
Values
Table (7.2.8b)

8.4978E+5

3.3455E+5

rejected

rejected

at

at

Probability Density Function

Q-Q Plot
1.1E+6

0.48
1E+6

0.44

900000

0.4
0.36

800000

0.32
700000
Q
uantile(Model)

f(x)

0.28
600000

0.24

500000

0.2
0.16

400000

0.12
300000
0.08
200000

0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

100000
100000

200000

300000

x
Histogram

Test

Kolmogorov

statistics
0.09508

-Smirnov
Anderson-

0.98598

Inv. Gaussian

600000

700000

800000

900000

1E+6

1.1E+6

Inv. Gaussian

(b) Q-Q Plot of Inverse Gaussian distribution

P values

Critical

Status

0.38004

value
0.14274

accepted at different levels of significance

2.5018

Accepted at different levels of significance

12.592

Accepted at different levels of significance

Darling
Chi-Squared 8.3096
0.21629
7.2.10
Rayleigh Distribution
Parameters
Values
Table (7.2.10a)

500000

(a) Histogram of Inverse Gaussian distribution


Table (7.2.8c)
Test

400000

2.6693E+5

Q-Q Plot

Probability Density Function

1.1E+6

0.48
1E+6
0.44
900000

0.4

800000

0.36
0.32

700000
Quantile(Model)

f(x)

0.28

600000

0.24

500000

0.2
0.16

400000

0.12

300000

0.08
200000
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

100000
100000

200000

300000

400000

500000

600000

Histogram

700000

800000

900000

1E+6

1.1E+6

x
Rayleigh

Rayleigh

(a) Histogram of Rayleigh distribution

(b) Q-Q Plot of Rayleigh distribution

Table (7.2.10b)
Test

Test

P values

Critical value Status

Kolmogorov

statistics
0.16973

.01099

0.17126 (1%)

Only

0.1596(2%

significance and rejected at 2%,5% and

)
3.9074 (1%)

10% etc.
Only accepted

3.2892(2%

significance and rejected at 2%,5% and

)
16.812 (1%)

10% etc.
Only accepted

15.033(2%

significance and rejected at 2%,5% and

-Smirnov
Anderson-

3.7393

Darling
Chi-Squared

7.2.11

15.19

0.0188

accepted

at

at

at

1%

1%

1%

)
10% etc.
Generalized Extreme value Distribution

Table (7.2.11a)
Parameters
Values
Table (7.2.11b)

0.27275

1.1212E+5

2.2895E+5

levels

levels

levels

of

of

of

Probability Density Function

Q-Q Plot
1.1E+6

0.48
1E+6

0.44

900000

0.4
0.36

800000

0.32
700000
Q
uantile(M
odel)

f(x)

0.28
600000

0.24

500000

0.2
0.16

400000

0.12
300000
0.08
200000

0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

100000
100000

200000

300000

x
Histogram

Test

Kolmogorov

Statistic
0.07407

500000

600000

700000

800000

900000

1E+6

1.1E+6

Gen. Extreme Value

Gen. Extreme Value

(a) Histogram of GEV distribution


Table (7.2.11c)
Test

400000

(b) Q-Q Plot of GEV distribution

P values

Critical

Status

0.69163

value
0.14274

Accepted at different levels of significance

2.5018

Accepted at different levels of significance

12.592

Accepted at different levels of significance

-Smirnov
Anderson-

0.58447

Darling
Chi-Squared 2.3871
0.88088
7.2.12
Frechet Distribution
Table (7.2.12a)
Parameters
Values
Table (7.2.12b)

2.3386

2.3344E+5

11395.0

Probability Density Function

Q-Q Plot
1.1E+6

0.48
1E+6

0.44

900000

0.4
0.36

800000

0.32
700000
Q
uantile(M
odel)

f(x)

0.28
600000

0.24

500000

0.2
0.16

400000

0.12
300000
0.08
200000

0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

100000
100000

200000

300000

x
Histogram

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

Frechet (3P)

Frechet (3P)

(a) Histogram of Frechet distribution


Table (7.2.12c)

(b) Q-Q Plot of Frechet distribution

Test

Test

Critical

Status

Kolmogorov-

Statistic
0.05098

values
0.9556

value
0.1427

Accepted at different levels of significance

2.5018

Accepted at different levels of significance

12.59

Accepted at different levels of significance

Smirnov
Anderson-

0.33501

Darling
Chi-Squared
Conclusion

4.3167

0.7184

The Log Normal and inverse Gaussian distributions are found to be a good fit for the Marala
Site. The Log normal distribution gives significant results for only 20% level of significance and
chi-square GOF test is accepted at less than 20% level of significance. The Rayleigh and weibull
distribution is only accepted at one percent level of significance. The Generalized extreme value
distribution, Inverse Gaussian and frechet distribution are seems to be a good fit to the marala
Site.
7.3

Shahdra Flood Peaks

The descriptive measure of Shahdara flood site is as follows:


Table (7.3.1)
Statistic
Size
Range

Values
88
553800

Percentiles
Min
5%

Values
22205
28450.0

Mean
Variance
Standard

89196.0
7639100000
87402.0

deviation
Coefficient

of 0.97988

Variation
Standard error
9317.1
Skewness
3.8177
Excess Kurtosis
17.989
7.3.1 Exponential Distribution

10%
25% (Q1)
50% (Median)

32609.0
45555
60000

75% (Q3)

97875.0

90%
95%
Max

182440
213750
576000

1.1211E-5

Parameters
Values

Q-Q Plot

Probability Density Function


560000

0.8

520000
0.72

480000
440000

0.64

400000

0.56

360000
320000

Q
uantile(M
odel)

f(x)

0.48

0.4

280000
240000

0.32

200000
0.24

160000

0.16

120000
80000

0.08

40000
0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

40000

80000

120000

Histogram

(a) Histogram of Exponential distribution


Table (7.3.1b)
Test

Kolmogorov-

Statistic
0.23883

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

Exponential

Exponential

Test

160000

(b) Q-Q Plot of Exponential distribution

P values

Critical

Status

6.6876E-5

value
o.14274

Reject at different levels of significance

7.2808

2.5018

Reject at different levels of significance

Darling
Chi-Squared
53.271
1.0346E-9
7.3.2 Gamma Distribution

12.592

Reject at different levels of significance

Smirnov
Anderson-

Table (7.3.2a)


1.0415

Parameters
Values
Table (7.3.2b)

85644.0

Probability Density Function

Q-Q Plot
560000

0.8

520000
0.72
480000
0.64

440000
400000

0.56

360000
0.48
Q
uantile(Model)

f(x)

320000

0.4

280000
240000

0.32

200000
0.24

160000

0.16

120000
80000

0.08

40000
0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

40000

80000

120000

160000

200000

240000

280000

x
Histogram Gamma

400000

440000

480000

520000

560000

(b) Q-Q Plot of Frechet distribution

Test

Test

P values

Critical

Status

Kolmogorov

Statistic
0.23068

value
1.3389E-4 0.14274

Reject

-Smirnov

at

different

levels

of

at

different

levels

of

significance
Reject at

different

levels

of

significance

Anderson-

6.8804

Darling
Chi-Squared

50.92

2.5018
3.0741E-9 12.592

Reject

significance
Normal Distribution

Table (7.3.3a)
Parameters
Values
Table (7.3.3b)

360000

Gamma

(a) Histogram of Gamma distribution


Table (7.3.2c)

7.3.3

320000

87402.0

89196.0

Probability Density Function

Q-Q Plot
560000

0.8

520000
0.72
480000
0.64

440000
400000

0.56

360000
0.48
Quantile(M
odel)

f(x)

320000

0.4

280000
240000

0.32

200000
0.24

160000

0.16

120000
80000

0.08

40000
0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

40000

80000

120000

160000

200000

240000

280000

x
Histogram Normal

Test Statistic

P values

Critical

Status

Kolmogorov

0.24227

4.9499E-5

value
0.14274

Reject

-Smirnov

at

9.6237

480000

520000

560000

different

levels

of

48.39

2.9566E-9

2.5018

Reject

at

different

levels

of

11.07

significance
Reject at

different

levels

of

Log-Normal Distribution

Table (7.3.4a)
Parameters
Values
Table (7.3.4b)

440000

significance

significance
7.3.4

400000

(b) Q-Q Plot of Normal distribution

Test

Darling
Chi-Squared

360000

Normal

(a) Histogram of Normal distribution


Table (7.3.4a)

Anderson-

320000

0.63479

11.151

Probability Density Function


0.8

0.72

0.64

0.56

f(x)

0.48

0.4

0.32

0.24

0.16

0.08

0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

x
Histogram

Lognormal

Q-Q Plot
560000
520000
480000
440000
400000
360000
Quantile(Model)

320000
280000
240000
200000
160000
120000
80000
40000
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

x
Lognormal

(a) Histogram of Log-Normal distribution


Table (7.3.4c)

(b) Q-Q Plot of Log-Normal distribution

Test

Test

Kolmogorov-

Statistic
0.12515

values
0.11646 20%(0.11248)

Smirnov

Critical value

Status
Only rejected at 20% level of

0.1285(20%

significance

1%,2%,5%
significance

and
and

accepted
10%

level

at
of

Anderson-

1.5027

Darling

Chi-Squared

19.857

0.0029

20%(1.9286)

Only rejected at 20% level of

1.9286(20%

significance

1%,2%,5%

12.592

significance
Rejected at

4
7.3.5

and
and

accepted
10%

level

of

different

level

of

significance

Logistic Distribution

Table (7.3.5a)

48187.0

Parameters
Values
Table (7.3.5b)

89196.0

Probability Density Function

Q-Q Plot
560000

0.8

520000
0.72
480000
0.64

440000
400000

0.56

360000
0.48
Q
uantile(M
odel)

f(x)

320000
0.4

280000
240000

0.32

200000
0.24

160000

0.16

120000
80000

0.08

40000
0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

40000

80000

120000

160000

200000

240000

280000

P values

Statisti
Kolmogorov-

c
0.24277

Smirnov

440000

480000

520000

560000

(b) Q-Q Plot of Logistic distribution

Critical

Status

value
4.7379E-

0.14274

Rejected at different level of significance

2.5018

Rejected at different level of significance

12.592

Rejected at different level of significance

Anderson-

8.7112

Darling
Chi-Squared

40.643

7.3.6

400000

Logistic

(a) Histogram of Logistic distribution


Table (7.3.5c)
Test

360000

Histogram Logistic

Test

320000

3.4048E-

7
Weibull Distribution

at

Table (7.3.6a)

1.8529

Parameters
Values
Table (7.3.6b)

91883.0

Probability Density Function

Q-Q Plot
560000

0.8

520000
0.72
480000
0.64

440000
400000

0.56

360000
0.48
Quantile(Model)

f(x)

320000

0.4

280000
240000

0.32

200000
0.24

160000

0.16

120000
80000

0.08

40000
0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

40000

80000

120000

160000

200000

240000

280000

x
Histogram Weibull

400000

440000

480000

520000

560000

(b) Q-Q Plot of Weibull distribution

P values

Critical

Status

0.01003

value
1%(0.1712

Only accepted at 1% levels of

rov-

6)

significance and rejected at 2%,5%

Smirnov

0.159(2

and 10% etc.

Anderson 4.8786

%)
2.5018

Rejected

at

different

level

of

11.07

significance
Rejected at

different

level

of

Kolmogo

-Darling
Chi-

Test Statistic

360000

Weibull

(a) Histogram of Weibull distribution


Table (7.3.6c)
Test

320000

0.17122

16.301

0.00604

Squared
7.3.7 Inverse Gaussian Distribution:

significance

Table (7.3.7a)
Parameters
Values
Table (7.3.7b)

92896.0

89196.0

Q-Q Plot

Probability Density Function

560000

0.8

520000
0.72
480000
0.64

440000
400000

0.56

360000
0.48
f(x)

Quantile(Model)

320000
280000
240000

0.4

0.32

200000
0.24

160000

0.16

120000
80000

0.08

40000
0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

40000

80000

120000

160000

200000

240000

280000

x
Inv. Gaussian

(a) Histogram of

320000

360000

400000

440000

480000

520000

560000

x
Histogram

Inv. Gaussian

Inverse Gaussian distribution (b) Q-Q Plot of

Inverse Gaussian

distribution
Table (7.3.7c)
Test

Test

Kolmogorov-

Statistic
0.18289

P values

Critical

Status

0.00477

value
0.14274

Rejected

Smirnov
Anderson-

at

different

level

of

significance
3.7881

1%(3.9074)

Only accepted at 1% levels of

Darling

significance and rejected at 2%,5%

Chi-Squared

3.2892(2%)
12.592

and 10% etc.


Rejected at

22.227

0.0011

significance
7.3.8

Rayleigh Distribution

Table (7.3.8a)
Parameters
Values
Table (7.3.8b)

71168.0

different

level

of

Probability Density Function

Q-Q Plot
560000

0.8

520000
0.72
480000
0.64

440000
400000

0.56

360000
0.48
Q
u
a
n
tile(M
o
d
e
l)

f(x)

320000
0.4

280000
240000

0.32

200000
0.24

160000

0.16

120000
80000

0.08

40000
0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

40000

80000

120000

160000

200000

240000

280000

x
Histogram Rayleigh

Test

Kolmogorov-

Statistic
0.23978

Smirnov

400000

440000

8.0591

Darling
Chi-Squared

29.91

520000

560000

(b) Q-Q Plot of Rayleighdistribution

P values

Critical

Status

6.1559E-

value
0.14274

Rejected at different level of significance

2.5018

Rejected at different level of significance

12.592

Rejected at different level of significance

4.0891E-

5
Generalized Extreme value Distribution

Table (7.3.9a)
Parameters
Values
Table (7.3.9b)

480000

Anderson-

7.3.9

360000

Rayleigh

(a) Histogram of Rayleigh distribution


Table (7.3.8c)
Test

320000

K
0.45104

26675.0

52585.0

Probability Density Function

Q-Q Plot
560000

0.8

520000
0.72
480000
0.64

440000
400000

0.56

360000
0.48
Q
u
a
n
tile(M
o
d
e
l)

f(x)

320000
0.4

280000
240000

0.32

200000
0.24

160000

0.16

120000
80000

0.08

40000
0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

40000

80000

120000

160000

x
Histogram

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

Gen. Extreme Value

Gen. Extreme Value

(a) Histogram of GEV distribution

(b) Q-Q Plot of GEVdistribution

Test

Test

Critical

Status

Kolmogorov-

Statistic
0.06397

values
0.8413

value
0.14274

Accepted at different level of significance

2.5018

Accepted at different level of significance

12.592

Accepted at different level of significance

Smirnov

Anderson-

0.38742

Darling
Chi-Squared

6.0328

7.3.10

0.4195

3
Frechet Distribution

Table (7.3.10a)
Parameters
Values
Table (7.3.10b)

2.1416

57449.0

4775.4

Probability Density Function

Q-Q Plot
560000

0.8

520000
0.72
480000
0.64

440000

0.56

400000
360000

0.48
Q
u
a
n
tile(M
o
d
e
l)

f(x)

320000
0.4

280000
240000

0.32

200000
0.24
160000
0.16

120000
80000

0.08

40000
0
40000

80000

120000

160000

200000

240000

280000

320000

360000

400000

440000

480000

520000

560000

40000

80000

120000

160000

200000

240000

x
Histogram

280000

320000

360000

400000

440000

480000

520000

560000

Frechet (3P)

Frechet (3P)

(a) Histogram of Frechet distribution


Table (7.3.10c)

(b) Q-Q Plot of Frechetdistribution

Test

Test

P values

Critical

Status

Kolmogorov-

Statistic
0.06598

0.75404

value
0.1427

Accepted at different level of significance

2.5018

Accepted at different level of significance

12.592

Accepted at different level of significance

Smirnov
Anderson-

0.38143

Darling
Chi-Squared
Conclusion

4.8019

0.5497

The log normal and Inverse Gaussian distribution is very sensitive to this data the Anderson
Darling and Kolmogorov Smirnove GOF is significant at 20% level of significance only while
the chi square test is rejected at different level of significance. The GEV distribution and Frechet
distribution is good fit to the data.
7.4

Mangla Flood Peaks

Table (7.4.1)
Statistic
Size
Range
Mean
Variance

Values
89
1067600
234200
43733000000

Percentiles
Min
5%
10%
25% (Q1)

Values
22400
76400.0
88190
117230

Standard deviation 209120


Coefficient
of 0.89294
Variation
Standard error
22167.0
Skewness
2.6163
Excess Kurtosis
7.378
7.4.1 Exponential Distribution

50% (Median)
75% (Q3)

157000
274350

90%
95%
Max

440680
760250
109000

Table (7.4.1 a)

4.2699E-6

Parameters
Values
Table (7.4.1b)
Probability Density Function
0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

x
Histogram

Exponential

700000

800000

900000

1E+6

1.1E+6

Q-Q Plot
1.1E+6
1E+6

900000
800000

700000

Q
uantile(M
odel)

600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Exponential

(a) Histogram of Exponential distribution


Table (7.4.1c)
Test

Test

Kolmogorov-

Statistic
0.24434

(b) Q-Q Plot of Exponentialdistribution

P values

Critical

Status

3.6524E-5

value
0.14195

Rejected

Smirnov

at

6.1768

Darling
Chi-Squared

34.051

of

2.3256E-6

2.5018

Rejected

at

different

level

of

11.07

significance
Rejected at

different

level

of

significance

Gamma Distribution
MLE = 1.0e+05=1.1245

Parameters
Values

level

significance

Anderson-

7.4.2

different

4.2699E-6

1.8673E+5

Probability Density Function


0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Histogram

Gamma

Q-Q Plot
1.1E+6
1E+6

900000
800000

Quantile (Model)

700000
600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

700000

800000

900000

x
Gamma

Test

Test Statistic

P values

Critical
value

Status

1E+6

1.1E+6

Kolmogorov-

0.19605

0.0018

0.14195

Smirnov
AndersonDarling
Chi-Squared

Rejected at different
level of significance

4.2518
22.805

3.6774E-4

2.5018

Rejected at different

11.07

level of significance
Rejected at different
level of significance

7.4.3

Normal Distribution

=2.0912E+5 =2.3420E+5
Parameters
Values

2.0912E+5

2.3420E+5= 2.0795E+5

2.3420E+5

Probability Density Function


0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Histogram

Normal

Q-Q Plot
1.1E+6
1E+6

900000
800000

Quantile (Model)

700000
600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

700000

800000

900000

x
Normal

Test

Test Statistic

P values

Critical
value

Status

1E+6

1.1E+6

Kolmogorov-

0.2252

1.9015E-4

0.14195

Smirnov
Anderson-

Rejected

at

different level of
8.5233

0.14195

significance
Rejected

at

Darling

different level of

Chi-Squared

significance
Rejected

45.1

1.3845E-8

11.07

at

different level of
significance

7.4.4

Log-Normal Distribution
MLE

Parameters
Values

= 12.10510.6903

0.68641

12.105

Probability Density Function


0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Histogram

Lognormal

Q-Q Plot
1.1E+6
1E+6

900000
800000

Quantile (Model)

700000
600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

700000

800000

900000

x
Lognormal

Test

Test Statistic

P values

Critical
value

Status

1E+6

1.1E+6

Kolmogorov-

0.08973

0.44483

0.14195

Smirnov
Anderson-

Accepted

at

different level of
0.93497

2.5018

significance
Accepted

at

Darling

different level of

Chi-Squared

significance
Accepted

11.882

0.06466

12.592

at

different level of
significance
7.4.5

Logistic Distribution

=1.1530E+5 =2.3420E+5
Parameters
Values

1.1530E+5

1.9489 0.8898

2.3420E+5

Probability Density Function


0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Histogram

Logistic

Q-Q Plot
1.1E+6
1E+6

900000
800000

Quantile (Model)

700000
600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

700000

800000

900000

x
Logistic

Test

Test Statistic

P values

Critical
value

Status

1E+6

1.1E+6

Kolmogorov-

0.22296

2.2843E-4

0.14195

Smirnov
AndersonDarling
Chi-Squared

Rejected at different
level of significance

7.5814
40.742

1.0578E-7

2.5018

Rejected at different

11.07

level of significance
Rejected at different
level of significance

7.4.6

WeibullDistribution

=1.7178 =2.4515E+5
Parameters
Values

1.7178

MLE 2.5814E+5

2.4515E+5

Probability Density Function


0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Histogram

Weibull

Q-Q Plot
1.1E+6
1E+6

900000
800000

Quantile (Model)

700000
600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

700000

800000

900000

x
Weibull

Test

Test Statistic

P values

Critical
value

Status

1E+6

1.1E+6

Kolmogorov-

0.13959

0.0564

Smirnov

20%(0.11186) Rejected
10%(0.12779) 10%

Anderson-

3.5537

Darling
Chi-Squared

15.287

0.01814

20%
level

significance

2%(0.15873)

accepted

1%(0.17031)

and 5%

2.5018

significance
Rejected at different

1%(16.812)

level of significance
Only accepted at 1%

15.033

levels of significance

at2%

and rejected at 2%,5%

at

Inverse Gaussian Distribution

=2.9373E+5 =2.3420E+5
Parameters
Values

2.9373E+5

of

5%(0.14195)

and 10% etc.


7.4.7

and

MLE= 2.3420 =3.7408

2.3420E+5

and
1%,2%
level of

Probability Density Function


0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Histogram

Inv. Gaussian

Q-Q Plot
1.1E+6
1E+6

900000
800000

Quantile (Model)

700000
600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

700000

800000

900000

x
Inv. Gaussian

Test

Test Statistic

P values

Critical
value

Status

1E+6

1.1E+6

Kolmogorov-

0.12089

0.1365

Smirnov
Anderson-

20%(0.11186) Only rejected at 20% and


0.12779 at

Accepted at 5%,2%and1%

5%,2%and1%
20%(1.3749) Only rejected at 20% and

1.7242

Darling

1.9286 at

Chi-Squared

5%,2%and1%
5%(11.O7)
Accepted at different level

4.7111

0.45214

Accepted at 5%,2%and1%

of significance
7.4.8

Rayleigh Distribution

=1.8686E+5
Parameters
Values

MLE= 2.2146e+05

1.8686E

Probability Density Function


0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Histogram

Rayleigh

Q-Q Plot
1.1E+6
1E+6

900000
800000

Quantile (Model)

700000
600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

700000

800000

900000

x
Rayleigh

Test

Test Statistic

P values

Critical
value

Status

1E+6

1.1E+6

Kolmogorov-

0.21416

4.6195E-4

0.14195

Smirnov
AndersonDarling
Chi-Squared

Rejected

at

different

level of significance
7.3189
15.801

0.00744

2.5018

Rejected

at

different

11.07

level of significance
Rejected at different
level of significance

7.4.9

Generalized Extreme Value Distribution

k=0.3965 =78946.0 =1.3838E+5


Parameters
Values

K
0.3965

78946.0

1.3838E+5

Probability Density Function


0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Histogram

Gen. Extreme Value

Q-Q Plot
1.1E+6
1E+6

900000
800000

Quantile (Model)

700000
600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

700000

800000

900000

x
Gen. Extreme Value

Test

Test Statistic

P values

Critical
value

Status

1E+6

1.1E+6

Kolmogorov-

0.05547

0.93281

Smirnov

20%(0.11186) Accepted at different level


10%(0.12779) of significance
5%(0.14195)
2%(0.15873)

Anderson-

0.36255

Darling

1%(0.17031)
20%(1.3749)

Accepted at different level

10%(1.9286)

of significance

5%(2.5018)
2%(3.2892)
Chi-Squared

0.61399

0.99616

1%(3.9074)
20%(8.5581)

Accepted at different level

10%(10.645)

of significance

5%(12.592)
2%(15.033)
1%(16.812)
7.4.10
Parameters
Values

Frechet Distribution

1.3727

1.0992E+5

21478.0

Probability Density Function


0.56
0.52
0.48
0.44
0.4
0.36

f(x)

0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000

200000

300000

400000

500000

600000

700000

800000

900000

1E+6

1.1E+6

x
Histogram

Frechet

Q-Q Plot
1.1E+6
1E+6

900000
800000

Quantile (Model)

700000
600000

500000
400000

300000
200000

100000

100000

200000

300000

400000

500000

600000

x
Frechet

700000

800000

900000

1E+6

1.1E+6

Test

Test Statistic

KolmogorovSmirnov
AndersonDarling
Chi-Squared

P values

Critical

Status

0.737

value
0.1419

Accepted at different

0.08345

level of significance

5.0444
3.4969

0.7444

2.5018

Accepted at different

12.592

level of significance
Accepted at different
level of significance

Conclusion
The log normal distribution seems to be a good fit for Mangla site. The GOF test is only rejected
at 20 % alpha for Inverse Gaussian distribution the chi square test is accepted at all level of
alpha so the IGD is also good fit to the data. The GEV and Frechet Distribution good fit to the
Mangla site.
7.5

Kohat

Statistic
Size
Range
Mean
Variance
Standard deviation
Coefficient
Variation
Standard error
Skewness
Excess Kurtosis
7.5.1

of

Values
36
94.89
143.74
494.72

Percentiles
Min
5%
10%
25% (Q1)

22.242

50% (Median)

0.15474

75% (Q3)

175.86

3.7071
1.0118
0.85693

90%
95%
Max

188.38
210.34

Exponential Distribution

Parameters
Values

0.0.00696

Values
115.45
116.53
126.57
139.5
155.25

Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

176

184

192

200

208

x
Histogram

Exponential

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

168

x
Exponential

176

184

192

200

208

Test

Test Statistic

Kolmogorov-

0.55209

Smirnov

P values

Critical

Status

2.0907E-

value
0.22119

Rejected at different

10

Anderson-

12.19

Darling
Chi-Squared

81.348

1.1102E-

level of significance
2.5018

Rejected at different

9.4877

level of significance
Rejected at different

16
7.5.2

level of significance

Gamma

Parameters
Values

41.766

3.4417

Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

176

168

176

184

192

200

208

x
Histogram

Gamma

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

x
Gamma

184

192

200

208

Test

Test Statistic

Kolmogorov-

0.13474

P values

Critical

Status

0.48888

value
0.22119

accepted at different

Smirnov

level of significance

Anderson-

0.61054

Darling
Chi-Squared

0.72191

0.9486

2.5018

accepted at different

9.4877S

level of significance
accepted at different
level of significance

7.5.3

Normal Distribution

Parameters
Values

143.74

22.242

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

168

176

184

192

200

208

x
Normal

Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

176

184

x
Histogram

Test

Test Statistic

P values

Normal

Critical
value

Status

192

200

208

Kolmogorov-

0.15494

0.31941

0.22119

Smirnov
AndersonDarling
Chi-Squared

accepted at different
level of significance

0.8509
4.9146

0.29618

2.5018

accepted at different

9.4877

level of significance
accepted at different
level of significance

7.5.4

Log-Normal Distribution

Parameters
Values

4.9572

0.14529

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

168

176

184

192

200

208

x
Lognormal (3P)

Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

x
Histogram

Lognormal (3P)

176

184

192

200

208

Test

Test Statistic

Kolmogorov-

0.085523

P values

Critical

Status

0.93628

value
0.22119

accepted

Smirnov

at

different

level of significance

Anderson-

0.28257

Darling
Chi-Squared

1.8966

0.59413

2.5180

accepted

at

different

7.8147

level of significance
accepted at different
level of significance

7.5.5

Logistic Distribution

Parameters
Values

12.265

143.74

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

168

176

184

192

200

208

x
Logistic

Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

x
Histogram

Logistic

176

184

192

200

208

Test

Test Statistic

Kolmogorov-

0.0852

P values

Critical

Status

0.93628

value
0.22119

accepted at different level

Smirnov

of significance

Anderson-

0.28257

Darling
Chi-Squared

1.8966

0.59413

2.5018

accepted at different level

7.8147

of significance
accepted at different level
of significance

7.5.6

Nakagami Distribution

Parameters
Values

M
9.4017

21143.0

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

168

176

184

192

200

208

x
Nakagami

Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

x
Histogram

Nakagami

176

184

192

200

208

Test

Test Statistic

Kolmogorov-

0.14202

P values

Critical

Status

0.42288

value
0.22119

accepted

at

Smirnov

different level of

Anderson-

significance
accepted

0.7219

2.5018

at

Darling

different level of

Chi-Squared

significance
accepted

4.7522

0.31368

9.4877

at

different level of
significance
7.5.7

Weibull Distribution

Parameters
Values

8.0821

150.33

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

168

176

184

192

200

208

x
Weibull

Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

x
Histogram

Weibull

176

184

192

200

208

Test

Test Statistic

Kolmogorov-

0.14942

P values

Critical

Status

0.36137

value
0.22119

accepted at different

Smirnov

level of significance

Anderson-

1.6686

Darling
Chi-Squared

3.27

0.35184

1.3749

accepted at different

7.8147

level of significance
accepted at different
level of significance

7.5.8

Inverse Gaussian distribution

Parameters
Values

6003.5

143.74
Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

x
Histogram

Inv. Gaussian

176

184

192

200

208

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

168

176

184

192

200

208

x
Inv. Gaussian

Test

Test Statistic

P values

Critical

Status

Kolmogorov-

0.15524

0.31724

value
0.22119

accepted at different level

Smirnov

of significance

Anderson-

0.73821

Darling
Chi-Squared

3.6163

0.46042

2.5018

accepted at different level

9.4877

of significance
accepted at different level
of significance

7.5.9

Rayleigh Distribution

Parameters
Values

114.69

Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

176

168

176

184

192

200

208

x
Histogram

Rayleigh

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

184

192

200

208

x
Rayleigh

Test

Test Statistic

P values

Critical

Status

Kolmogorov-

0.39748

1.2129E-5

value
0.22119

Rejected at different

Smirnov
AndersonDarling
Chi-Squared

level of significance
7.1506
33.736

2.2525E-7

2.5018

Rejected at different

7.8147

level of significance
Rejected at different
level of significance

7.5.10
Parameters
Values

Generalized Extreme Value Distribution


K
0.06266

16.71

133.0

Test

Test Statistic

P values

Critical

Status

Kolmogorov-

0.07291

0.98347

value
0.22119

Rejected at different

Smirnov
AndersonDarling
Chi-Squared

level of significance
0.2401
0.21649

0.99455

2.5018

Rejected at different

9.4877

level of significance
Rejected at different
level of significance

7.5.11
Parameters
Values

Frechet Distribution

7.0341

107.41

25.242

Probability Density Function

0.36

0.32

0.28

f(x)

0.24

0.2

0.16

0.12

0.08

0.04

0
120

128

136

144

152

160

168

x
Histogram

Frechet (3P)

176

184

192

200

208

Q-Q Plot
208
200
192
184

Quantile (Model)

176
168
160
152
144
136
128
120

120

128

136

144

152

160

168

176

184

192

200

208

x
Frechet (3P)

Test

Test Statistic

KolmogorovSmirnov
AndersonDarling
Chi-Squared

P values

Critical

Status

0.9348

value
0.2212

Rejected at different

0.08548

level of significance

0.2981
1.6054

0.6581

2.5018

Rejected at different

7.8147

level of significance
Rejected at different
level of significance

Conclusion
The Gamma Distribution shows insignificant results by GOF test (Anderson Darling,
Kolmogorov Simirnove and chi square) and concludes as a good fit for the rainfall data of Kohat
Site. Normal Log normal logistic Nakagami and inverse Guassian Distribution are also support
the null hypothesis of all three GOF tests.
7.6

MUZAFARABAD RAINFALL SITE

Statistic
Size
Range
Mean
Variance
Standard deviation
Coefficient
of

Values
56
98.59
126.98
470.81
21.698
0.17088

Percentiles
Min
5%
10%
25% (Q1)
50% (Median)
75% (Q3)

Values
80.84
90.476
99.366
111.61
124.37
143.27

Variation
Standard error
Skewness
Excess Kurtosis
7.6.1

2.8995
0.24919
-0.25919

Exponential Distribution

Parameters
Values

0.00788

90%
95%
Max

152.87
168.92
179.43

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Exponential

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Exponential

Test

Test Statistic

P values

Critical
value

Status

170

180

Kolmogorov-

0.47721

Smirnov
AndersonDarling
Chi-Squared

6.9056E-

0.17823

12
17.902
117.14

Rejected at different
level of significance

2.5018

Rejected at different

7.8147

level of significance
Rejected at different
level of significance

7.6.2

Gamma Distribution

Parameters
Values

34.247

3.7077

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Gamma

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Gamma

Test

Test Statistic

P values

Critical
value

Status

170

180

Kolmogorov-

0.07178

0.9152

0.17825

Smirnov
AndersonDarling
Chi-Squared

accepted at different
level of significance

0.27545
3.103

0.68411

2.5018

accepted at different

11.07

level of significance
accepted at different
level of significance

7.6.3

Normal Distribution

Parameters
Values

21.698

126.98

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Normal

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Normal

Test

Test Statistic

P values

Critical
value

status

170

180

Kolmogorov-

0.07918

0.8466

0.17823

Smirnov
AndersonDarling
Chi-Squared

accepted at different
level of significance

0.36388
3.7319

0.4435

2.5018

accepted at different

9.4877

level of significance
accepted at different
level of significance

7.6.4

Log-Normal Distribution

Parameters
Values

0.17118

4.8295

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Lognormal

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Lognormal

Test

Test Statistic

P values

Critical
value

status

170

180

Kolmogorov-

0.07666

0.87204

0.17823

Smirnov
AndersonDarling
Chi-Squared

accepted at different
level of significance

0.2812
2.5752

0.76513

2.5018

accepted at different

11.07

level of significance
accepted at different
level of significance

7.6.5

Logistic Distribution

Parameters
Values

126.98

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Logistic

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Logistic

Test

Test Statistic

P values

Critical
value

status

170

180

Kolmogorov-

0.10072

0.5856

0.17823

Smirnov
AndersonDarling
Chi-Squared

accepted at different
level of significance

0.55964
5.4454

0.24457

2.55964

accepted at different

9.4877

level of significance
accepted at different
level of significance

7.6.6

Weibull Distribution

Parameters
Values

6.9575

134.57

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Weibull

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Weibull

Test

Test Statistic

P values

Critical
value

status

170

180

Kolmogorov-

0.09036

0.71625

0.17823

Smirnov
AndersonDarling
Chi-Squared

accepted at different
level of significance

0.77165
6.3995

0.17123

2.5018

accepted at different

9.4877

level of significance
accepted at different
level of significance

7.6.7

Inverse Gaussian distribution

Parameters
Values

4348.7

126.98

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Inv. Gaussian

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Inv. Gaussian

Test

Test Statistic

P values

Critical
value

status

170

180

Kolmogorov-

0.08824

0.74246

0.17823

Smirnov
AndersonDarling
Chi-Squared

accepted at different
level of significance

0.42201
5.2269

0.38882

2.5018

accepted at different

11.07

level of significance
accepted at different
level of significance

7.6.8

Rayleigh Distribution

Parameters
Values

101.32

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Rayleigh

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Rayleigh

Test

Test Statistic

P values

Critical
value

status

170

180

Kolmogorov-

0.30846

3.1424E-5

0.17823

Smirnov
AndersonDarling
Chi-Squared

level of significance
9.6572
49.919

8.3130E-

2.5018

Rejected at different

7.8147

level of significance
Rejected at different

11
7.6.9

Rejected at different

level of significance

Generalized Extreme Value Distribution

Parameters
Values

k
0.18291

20.578

118.3

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Gen. Extreme Value

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Gen. Extreme Value

Test

Test Statistic

P values

Critical
value

status

170

180

Kolmogorov-

0.06799

0.94239

0.17823

Smirnov
AndersonDarling
Chi-Squared

accepted at different
level of significance

0.2588
3.0613

0.69054

2.5018

accepted at different

11.07

level of significance
accepted at different
level of significance

7.6.10
Parameters
Values

Frechet Distribution

7.0341

107.41

25.242

Probability Density Function


0.32

0.28

0.24

f(x)

0.2

0.16

0.12

0.08

0.04

0
80

90

100

110

120

130

140

150

160

170

180

x
Histogram

Frechet

Q-Q Plot
180

170

160

Quantile (Model)

150

140

130

120

110

100

90

80
80

90

100

110

120

130

140

150

160

x
Frechet

Test

Test Statistic

P values

Critical
value

Status

170

180

KolmogorovSmirnov
AndersonDarling
Chi-Squared

0.4537

0.1782

0.1117

level of significance

1.188
2.402

Rejected at different

0.8912

2.5018

Rejected at different

11.07

level of significance
Rejected at different
level of significance

Conclusion
The Gamma distribution gave the results of goodness of fit tests in the favour of null hypothesis
which seems to be a good fit for Muzafarabad site. Normal and log-normal both distributions are
significant for this data. The Generalized extreme value distribution, inverse Gaussian and
Weibull distributions are also found to be good for Muzafarabad rainfall station.
Over all Conclusion
The thirteen distributions are selected for carried out the analysis of average annually rainfall and
annually flood peaks data. Among these distributions some are found to be appropriate for the
rainfall data and some are good for flood peaks. Three distributions including Inverse Gaussian,
Generalized Extreme Value and Frechet distributions are appropriate for the flood peaks data.
And the Gamma distribution, Inverse Gaussian, Log Normal and weibull distribution found to be
good for rainfall data. The Inverse Gaussian Distribution can be used for both, the rainfall and
flood data.
Chapter 8
8.1

Introduction

In section 3 we have estimated the worst rainfall and flood after n years. On the basis of this
estimation one can make decisions to overcome the ultimate disasters. A higher level of levee
around the river, the renovation of dams, construction of buildings and emergency protection can
be made on the basis of this estimation.
8.2

A Decision Based upon Expected Loss

Let us consider the governmental decision as to whether or not to fund building or raising to a
higher level a Levee along a river which is prone to annual flooding. There are three options of

i
construction of building or raising the levee by an amount

$Ci
at a total cost of

where

0 1 2 3 and C1 C2 C3
The annual maximum flood heights for n years in the future are represented by the random

X
j

variables

Xj

j 1

where

i
the maximum height of the flood in j years. For the ith selection

is the flood level the levee can control and thereby prevent damage to farm land and residences.
During the jth year the damage that will be sustained is a function of

X j i

( X j i )

X j i

j i

(8.2.1)

And the number of hectares, inundated (which for simplicity we assume to be proportional to the
water excess over the levee.
a X j i

Hectares inundated
However there is a pay-off annually to the populace and government, when there have been no
homes destroyed and no flooding damage so harvesting is accomplished. The value, when no
damage occurs, again for simplicity assuming proportionality, is $h = [ hectare]. Here h is the
average monitory crop values considering the variety of crops with different worth. This gives
h.a. X j i

is the economic loss from flooding

$Ci
This loss must be measured against the cost of preventing flooding, namely,

, the total cost of

the levee construction or renovation to height


How is a decision to be made considering loss functions?? How can the levee pay for itself?
Considering an n-year period ahead the construction expense will be recovered if and only if for

i
the designated levee height

we have

h.a. X

i 1

Ci

(8.2.2)

Assumption I

j 1

Let the

be independent and identically distributed, that is, the distribution of annual

spring flood is stationary and does not change over time.. In this case we write X ~ F and then
we must compare values to see if the value of the crops and homes saved exceeds the amortized
cost, namely,

i 1

Ci
n.h.a

i 1, 2,3
(8.2.3)

We first obtain a simpler, more easily computable form

E x x dF x

E x xdF x dF x

E x E X dF x F

E x E X F x dx

(8.2.4)
A closed form for the equation (7.2.1) is difficult for distributions which can truly reflect the
annual flood distribution and one must often resort to numerical integration for evaluation. If no
flood exceeds the levee for a number of years then we have a positive return to the state economy
when for some period of n years
E X

Ci
n.a.h

i 1, 2,3

(8.2.5)

max X j

n
j 1

i i 1, 2,3

And no destructive flooding occurred because max

X
j

Yn max X j

n
j 1

are iid then

Since

P Yn F n

j 1

has its distribution given by

So this is the probability the construction of the levee will be positive. Thus based on records of
flood heights over the proceeding m years we can estimate the parameters of a chosen
representative distribution such as perhaps the Pareto distribution, first used by the Italian
Economist Vilfredo Pareto which can reflect a lot of variation.
8.3

Pareto Distribution

The Pareto distribution is widely used in literature as measured a distribution of income and a
population of the city within specific years. A random variable X is said to be a Pareto
distribution.


f x 1
x

; x ; , 0
(8.3.1)

F x 1

F x e

(8.3.2)

ln

(8.3.3)
7.4

Maximum Likelihood Method of Pareto Distribution

and
The estimates of

by using the ML method are provide those values which make the
L ( x, , )

likelihood function

as large as it is possible with available data.

The likelihood function of Pareto distribution is as follows;


n

L ( x, , )
i 1


xi 1
(8.4.4)

The calculus methods are mostly involved for maximizing the functions. However, here we dont
have any need of calculus work to observe that with the increase in k the likelihood function gets
very large beyond the bound. Because the value of k can never be larger of minimum of Xj for
all j = 1,2,3,4,n The best way to maximize the L(x,,) is to formulate k as follows:

K min X j

n
j 1

(8.4.5)
L ( x, , )

Since

L ( x, , )

> 0, so we can take log of

in order estimate the value of MLE of .


L ( x, , )

LogL( x, , )

because differentiate of

is likely to be more convenient as

so the log

likelihood function is
n

LogL( x, , ) n log n log ( 1) log( xi )


i 1

(8.4.6)

And differentiate the above equation with respect to we get


n
LogL ( x, , ) n
n log log( xi )

i 1

Equating to zero
LogL( x, , )
0

We get

(8.4.7)

n
n
n log log( xi ) 0

i 1

n
n
log( xi ) n log
i1

n
n
n
log( xi ) log
i1
i 1

n
n
log( xi ) log
i1

x
log i



i 1
n

(8.4.8)

The ML estimators of Pareto Distribution are as follows

min x

i
and

8.5

x
log i



i 1
n

The variance and covariance of these MLE:

The variance and covariance can be found by using the information matrix
Information matrix is the expectation of negative Hessian Matrix. Munir et
al(2012) i.e

2L
I E

i j

where i, j 1, 2,3,...

The information matrix for pareto distribution is

2L
2
I E 2
L

2L

2L
2

(7.5.1)

n
2
I E
n

n
2

(8.5.2)

We have

var I 1
Where

n
2

n
2
I
n

2
n 1 3

and

2
n 1 3

n 1


COV ,

8.6

(8.5.3)

(8.5.4)

(8.5.5)

Method of Moments

The Non-central moments for any kind of population is given as

x f x dx
r

is the rth population moment.


and the sample moments can be calculated as followed

(8.6.1)

1 n r
mr X i , r
n i 1

(8.6.2)

To find the estimates through the method of moments we equate the sample moments to the
population.
After some simplification we get
The rth moment is

r E X

r 1

dx

(8.6.3)
After solving equation () we get

r
r

(8.6.4)

and 2

2
2

(8.6.5)

1 m1 and 2 m2
We equate

and solve these equations simultaneously for and we get

m12 m2

m2

m12 m12 m2

m12 m2
(8.6.6)

m2 1
m1

m12 m2

2
1

m2 m12 m12 m2

(8.6.7)

8.7

The variance and covariance matrix

Let us consider a matrix R of order 2x2 to find the asymptotic variance and covariance of the
estimates.

r11
r21

r12
r22

r11
Where

, r12 1

(8.7.1)

r21
and

, r22 2

(8.7.2)

So the equation (7.7.1) becomes

1

R
2

2
1
1
and

1 2

2 2

2
2 2
2 2

2
2
2

(8.7.3)

(8.7.4)

(8.7.5)

Now we consider

RVRT V R 1 RT

(8.7.6)

Where the Matrix V is

var


cov ,

cov ,


var

(8.7.7)

and

var m1

cov m1, m2

var m2

(8.7.8)

Putting the values in equation (8.7.6)

1 2

2 2

var m1

cov m , m
1
2

cov m1 , m2

var m2

2 2

2
(8.7.9)

Solving this we get

var

1 2
2

4 2 var m1 1 2 var m2 2 2 4 cov m1, m2 1 2

4 4

(8.7.10)

4 2 var m1 1 4 var m2 2 4 4 cov m1 , m2 1 2 2 2


var
4 2 2

(8.7.11)


Cov ,

1 2 4 2 var m1 1

var m2 2 4 cov m1, m2 1 2 2 cov m1, m2 1 2


3

4 3

(8.7.12)

8.8

Probability Weighted Moments:

Method of Probability weighted method (MPWM) is also used to estimate the parameters of
Pareto distribution.

The PWM are

r E X 1 F X

The equation is taken from


The PWM for Pareto distribution is as followed:

(8.8.1)

X r
r E X r E X 1 r

(8.8.2)

Where

E X 1 r

r 1 r 1

(8.8.3)
r

r 1

Table (a) Flood sites

Terbela

MOM

MOM

MLE

5.108

308867.051

3.1168

MLE

272000

6
Mangla

2.1262

140747.95

0.47887

22400

Shahdara

2.433

52535.04

0.87469

22205

Marala

4.8892

218760.1144 1.0463

109780

Table (b) Rainfall Sites

MOM

MOM

MLE

MLE

Muzaffarabad

0.1430

130.566

2.288

80.84

Dera Ghazi Khan

3.4601

16.8431

1.7071

10.43

Risalpur

4.3830

268.8135

1.1092

130.67

Dir

10.004

120.1379

3.8409

106.32

Kohat

7.6423

124.9403

4.8004

115.45

Balakot

5.0974

333.8674

1.6041

230.67

The figure is taken from Pakistan Flood Impact Assessment report 2011-2012.

REFERENCES
References

Gumble, E.J (1941). The return period of flood flows. Annual mathematical Statistics,

12,163-190.
Huff, F.A. and J.C neill(1959), comparison of several methods for rainfall frequency

analysis j.geophys.Res.,64(5), 541-547,dio:10.1029/jz064i005p00541.


Darlymple,T.(1960):Flood frequency analysis, water supply

A,U.S.Geological Survey,Reston,Va.
Markovic , R.D.(1965).probability function of best fit distributin of annual precipitation

and runoff. Hydrology paper,8.


Dickinson, T. (1976). Rainfall intensity frequency relationship from monthly extremes.
Journal of Hydrology, Volume 35, Issue 1-2, October 1977, page 137-14.

paper

1543-

Landwehr, J.M.N.C.Matalas,and J.R.Wallis(1979),Probability weighting moments


compared with some traditional techniques in estimating Gumble parametes and

quantiles,Water Resour.Res.15(5),1055-1064,dio:10.1029/WR015i005p01055
Landwehr, J.M. Matalas, N.C. and Wallis, J.R.(1979 a). Probability weighted moments
compared with some traditional techniques in estimating gumble parameters and

quantiles. Water Resource Research, 15,1055-1064.


Chatfield, C. (1989).The Analysis of Time Series: An introduction (Fourth Edition

wd.).Chapman & Hall.


Haktanir, Tefaruk. "Comparison of various flood frequency distributions using annual

flood peaks data of rivers in Anatolia." Journal of Hydrology 136.1 (1992): 1-31.
10 Aksoy , H.(2000).Use of gamma distribution in Hydrological analysis. Turkish journal of
Engineering and Environmental Sciencess,24,419-428.
11 Koutsoyiannis,D and Baloutsos,G.(2000).Analysis of a long record of annual
inferences.Natural Hazards,29,29-48.
12 Park, J.S. and Jung H.S., Kim, R.S. and Oh, J.H.Modelling summer extreme rainfall over
Korean peninsula using Wakeby distribution.International Journal of Climatology,
21(11),1471-1384.
13 Kuczera,G.(2001).Comprehensian inference.Journal of Hydrology,Volum 248, Issues 14,Pages 152-167.
14 Pathak, C.S.(2001).Frequency analysis of short duration rainfall for central and south
Florida. Section-1, page 178,Bridging the gap: Meeting the Worlds Water and
Environmental resource challenges.
15 Zalina, M.D., Desa, M.N.M.,Nguyen, V.T.V. and Kassim, A.H.M.(2002).Selecting a
probability distribution for

extreme rainfall series in Malaysia.Water Science and

Technology, 45 (2), 63-68.


16 Coles, S. Pericchi, L.R and Sisson, S.(2003).A fully probabilistic approach to extreme
rainfall modeling. Journal of Hydrology.273(4), 35-50.
17 Ware, R. and Lad, f. (2003).Flood frequency analysis of Waimakariri River,
http://www.math.cantebury.ac.nz/research/ucdms2003n17.pdf.
18 Jakob , D., Karoly, D., Seed, A. (2009),. Rainfall analysis is the assumption of ststionary
still valid? Conference proceeding paper, ICSHMO conference.
19 Ahsanullah, M., Chan, S. & Balakrishnan, N. (1993). Recurrence relations for moments
of record values from generalized extreme value distribution. Communication in
Statistics: Theory and Methods, 22(5), 1471 1482.
20 Kamps, U. (1995). A concept of generalized order statistics. Journal of Statistical
Planning and Inference, 48, 1-23.

21 Barbson, B.B and Palutikof, J.P (1999) : test of generalized pareto distribution for
predicting extreme wind speeds. Journal of applied Meteorology, 39(9), 1627-1640
22 Thompson, Eric M., Laurie G. Baise, and Richard M. Vogel. "A global index earthquake
approach to probabilistic assessment of extremes." Journal of Geophysical Research:
Solid Earth (19782012) 112.B6 (2007).
23 Soliman, A.A., Abd Ellah, A.H. & Sultan, K. S. (2006). Comparison of estimates using
record Statistics from Weibull model: Bayesian and non-Bayesian approaches.
Computational Statistics and Data Analysis, 51, 2065-2077.
24 Khan, M.S., Pasha, G.R. & Pasha, A.H. (2008).Theoretical analysis of inverse Weibull
distribution. Wseas Transactions on Mathematics, 7(2).
25 Kao, ShihChieh, and Rao S. Govindaraju. "Trivariate statistical analysis of extreme
rainfall events via the Plackett family of copulas." Water Resources Research 44.2
(2008).
26 Sultan, K.S. (2008). Bayesian estimates based on record values from the inverse Weibull
lifetime model. Quality Technology & Quality Management, 5(4), 363 374.
27 Kwon, Young-Moon, Jeong-Woo Han, and Tae-Woong Kim. "Application of bivariate
frequency analysis for estimating design rainfalls." World Environmental & Water
Resources Congress. 2008.
28 Weiss, Jrme, and Pietro Bernardara. "Comparison of local indices for regional
frequency analysis with an application to extreme skew surges." Water Resources
Research 49.5 (2013): 2940-2951.
29 Hamdi, Y., et al. "Extreme storm surges: a comparative study of frequency analysis
approaches." Natural Hazards and Earth System Science 14.8 (2014): 2053-2067.
30 Meeyaem. K & Polpinit(2014). Mathematical Model for Flood Forecasting of the Chi
River
Basin. DOI: 10.7763/IPCBEE. 2014 . V63. 2

Change, I. P. O. C. (2007). Climate change 2007: The physical science basis. Agenda,
6(07), 333.