Vous êtes sur la page 1sur 15

See

discussions, stats, and author profiles for this publication at:


https://www.researchgate.net/publication/264160985

Prediction of the Level of Air Pollution


Using Principal Component Analysis and
Artificial Neural Network Techniques: a
Case Study in Malaysia

Article in Water Air and Soil Pollution July 2014


Impact Factor: 1.55 DOI: 10.1007/s11270-014-2063-1

CITATIONS READS

17 343

12 authors, including:

Hafizan Juahir Mohd. Ikhwan Toriman


Universiti Sultan Zainal Abidin | UniSZA Universiti Sultan Zainal Abidin | UniSZA
138 PUBLICATIONS 747 CITATIONS 137 PUBLICATIONS 437 CITATIONS

SEE PROFILE SEE PROFILE

Mohd khairul amri Kamarudin Mohd Talib Latif


Universiti Sultan Zainal Abidin | UniSZA National University of Malaysia
56 PUBLICATIONS 110 CITATIONS 186 PUBLICATIONS 947 CITATIONS

SEE PROFILE SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Azman Azid
letting you access and read them immediately. Retrieved on: 29 May 2016
Water Air Soil Pollut (2014) 225:2063
DOI 10.1007/s11270-014-2063-1

Prediction of the Level of Air Pollution Using Principal


Component Analysis and Artificial Neural Network
Techniques: a Case Study in Malaysia
Azman Azid & Hafizan Juahir & Mohd Ekhwan Toriman &
Mohd Khairul Amri Kamarudin & Ahmad Shakir Mohd Saudi &
Che Noraini Che Hasnam & Nor Azlina Abdul Aziz &
Fazureen Azaman & Mohd Talib Latif & Syahrir Farihan Mohamed Zainuddin &
Mohamad Romizan Osman & Mohammad Yamin

Received: 6 April 2014 / Accepted: 2 July 2014


# Springer International Publishing Switzerland 2014

Abstract This study focused on the pattern recognition respectively. The work has demonstrated the importance
of Malaysian air quality based on the data obtained from of historical data in sampling plan strategies to achieve
the Malaysian Department of Environment (DOE). Eight desired research objectives, as well as to highlight the
air quality parameters in ten monitoring stations in possibility of determining the optimum number of sam-
Malaysia for 7 years (20052011) were gathered. pling parameters, which in turn will reduce costs and time
Principal component analysis (PCA) in the environmetric of sampling.
approach was used to identify the sources of pollution in
the study locations. The combination of PCA and artifi- Keywords Environmetric . Pattern recognition .
cial neural networks (ANN) was developed to determine Principal component analysis . Artificial neural network
its predictive ability for the air pollutant index (API). The
PCA has identified that CH4, NmHC, THC, O3, and
PM10 are the most significant parameters. The PCA- 1 Introduction
ANN showed better predictive ability in the determina-
tion of API with fewer variables, with R2 and root mean Air pollution is a serious issue that needs to be given
square error (RMSE) values of 0.618 and 10.017, immediate and serious attention by all relevant author-
ities around the globe, as it is one of the most important
A. Azid : H. Juahir (*) : M. E. Toriman : factors that contributes to the quality of life and living.
M. K. A. Kamarudin : A. S. M. Saudi : C. N. C. Hasnam : Air pollution can be defined as a condition in which air
N. A. A. Aziz : F. Azaman : M. Yamin
pollutants are found in the atmosphere at high enough
East Coast Environmental Research Institute (ESERI),
Universiti Sultan Zainal Abidin, Gong Badak Campus, concentrations, above their normal ambient levels
21300 Kuala Terengganu, Terengganu, Malaysia (Seinfeld and Pandis 1998). The pollutants in the atmo-
e-mail: hafizanj@gmail.com sphere may disperse and/or concentrate during varied
M. T. Latif
time periods. This problem exists, especially in urban
School of Environmental and Natural Resource Sciences, areas where there are very high concentrations of pop-
Faculty of Science and Technology, Universiti Kebangsaan ulation and manufacturing industries (Azmi et al. 2010).
Malaysia, 43600 Bangi, Selangor, Malaysia The major sources of air pollutants in Malaysia are
S. F. M. Zainuddin : M. R. Osman
mobile, stationary, and trans-boundary sources (Afroz
Kulliyyah of Science, International Islamic University et al. 2003; Jamal et al. 2004; Department of
Malaysia, 25200 Kuantan, Pahang, Malaysia Environment Malaysia DOE 2005; Mutalib et al.
2063, Page 2 of 14 Water Air Soil Pollut (2014) 225:2063

2013). The mobile sources are mainly attributed to areas in Malaysia (Dominick et al. 2012; Latif et al.
motor vehicle emissions (Awang et al. 2000; Mutalib 2012; Mutalib et al. 2013), and have been recognized
et al. 2013), which contributed 82 % of the air emission as two of the major concerns that have high potential
load in Peninsular Malaysia (Yahaya et al. 2006). The for deleterious effects on health (Mahiyudin et al.
emissions of industrial fuel burning, power stations, 2013; Mazrura et al. 2011; Siew et al. 2008; Afroz
industrial production processes, domestic and commer- et al. 2003).
cial furnaces, and open burning at solid waste disposal Although the DOE has organized programs on
sites, which contributed to 9, 5, 3, 0.2, and 0.8 % of the air pollution in Malaysia which have provided
total air emission load, respectively, are types of station- important data, extensive analyses on these valu-
ary sources in Peninsular Malaysia (Yahaya et al. 2006). able data using statistical tools such as environmetric
Uncontrolled burning to clear land by plantation owners techniques for pattern recognition have not been fully
and farmers in Indonesia in 2006 is a case of a trans- applied. Environmetric techniques (also termed as
boundary pollution source that affected neighboring chemometrics) are a sub-set of multivariate statistical
countries (Jamal et al. 2004). Malaysia is one of the modelling and data treatment. These techniques are the
countries that experienced the trans-boundary haze epi- best approaches to apply to a large amount of complex
sodes, and the worst occurrence was in 1997 (Brauer environmental monitoring data because they can avoid
and Jamal 1998). misinterpretation of results produced during data analysis
The air pollutant index (API) can be defined as a (Mutalib et al. 2013). Environmetric techniques have
number that is used for reporting the quality of air with been applied in this study for pattern recognition, which
respect to its effects on human health (Bishoi et al. provides an understanding of the important patterns and
2009). The API is an important tool to inform the public underlying relationships in data, for getting deeper in-
about how clean or polluted the atmosphere is and what sights into the data and for modelling and visualizing
associated health effects might be of concern to us. The complex data for more insightful analysis. The applica-
API is an index for reporting daily, calculated using the tions of different environmetric techniques such as cluster
five major air pollutants (ozone (O3), particulate matter analysis (CA), principal component analysis (PCA), fac-
below 10 m (PM10), carbon monoxide (CO), sulfur tor analysis (FA), and discriminant analysis (DA) have
dioxide (SO2), and nitrogen dioxide (NO2)) regulated by been extensively applied in many scientific studies over
the Clean Air Act (Department of Environment DOE the last few years (Mutalib et al. 2013; Singh et al. 2004,
2014). To recover and maintain air quality and protect 2005), especially in air quality monitoring. Examples of
public health, the Malaysian Department of Environment air quality monitoring studies that used environmetric
(DOE) has set up the API and established national air techniques are Dominick et al. (2012), Wei-Zhen et al.
quality standards through the Recommended Malaysian (2011), Juneng et al. (2009), Lau et al. (2009), and Pires
Air Quality Guidelines (RMAQG) for each of these et al. (2008a, b, 2009). The application of these methods
pollutants (Mutalib et al. 2013; Dominick et al. 2012). for interpretation of the complex databases permits un-
The higher the level of the API, the greater the level of air derstanding of the air quality in the study area, especially
pollution and the greater the health concern. The status of for developing suitable plans for efficient management of
air quality in Malaysia is ruled by the establishment of the the air quality monitoring programs.
RMAQG issued by the DOE since 1989 (Mutalib et al. The prediction of air quality using a simple mathemat-
2013; Dominick et al. 2012). ical formula faces various challenges. It is unable to
Controlling the source of air pollutants is one of the capture the non-linear relationship among different vari-
major challenges in the world. In Malaysia, the DOE has ables (Mahboubeh et al. 2012) due to the complexity of
been consistently monitoring the air quality status and the data (Mutalib et al. 2013). The data on air quality are
collecting data in order to inform people about major found as stochastic time series, thereby making it possible
pollutant concentrations in real time (Dragomir 2010). to make a prediction on the basis of historical data
Once the lack of compliance is determined, the data can (Giorgio and Piero 1996). Prediction of air quality is
be used to advise or caution the decision makers or important to afford planning, proper actions, and control-
planners to avoid health effects (Kamal et al. 2006; ling strategies to minimize problems in human health.
Mutalib et al. 2013). Two major air pollutants are Several methods have been used for modelling air
PM10 and O3, particularly in the urban and suburban quality. The artificial neural network (ANN) is one of
Water Air Soil Pollut (2014) 225:2063 Page 3 of 14, 2063

the methods and has become a popular and useful tool Klang (STA06), Balok Baru (STA07), Pengkalan Chepa
for modelling environmental systems (Maier and Dandy (STA08), Paka (STA09), and Labuan (STA10). The
2001). ANN is a technique with greater flexibility, effi- coordinates and locations of the air quality monitoring
ciency, and accuracy, since it has an attractive feature stations are shown in Table 1 and Fig. 1, respectively.
which is similar to the brain. The ANN can be trained to Eight stations are located in Peninsular Malaysia and
identify non-linear patterns between input and output another two are in East Malaysia. These stations, which
values and can solve complex problems much faster lie in heavily industrialized areas, residential areas, and
than digital computers (Rahman et al. 2013; Zhang areas surrounded by congested main roads, were selected
et al. 1998), and able to train accurately when presented due to their location differences. Based on the Malaysian
with a new data (Azid et al. 2013; Afzali et al. 2012; DOE (2010) report, the overall status of air quality in
Mahboubeh et al. 2012; Kurt et al. 2008; Sohn et al. Malaysia is within good and moderate levels most of the
1999). Previous studies done by Mutalib et al. (2013), time. There are no occurrences of major natural disasters
Alkasassbeh et al. (2013), Brunelli et al. (2007), Tecer (such as typhoons, volcanic eruptions, and earthquakes)
(2007), Perez and Reyes (2006), and Niska et al. (2004, in these areas. The value of the API in Malaysia is usually
2005) prove that ANN is very well suited for solving influenced by the concentration of suspended particulate
environmental problems, especially in the analysis of air matter (PM10) (Awang et al. 2000) because the concen-
pollution. The ANN technique also can perform tasks tration of PM10 is always higher than that of other pollut-
based on training or initial experience, and it does not ants (Othman et al. 2010).
need an algorithm to solve a problem. This is because it
can generate its own distribution of the weights of the 2.2 Data Collection
links through learning and is easily inserted into the
existing technology (Garca et al. 2011). The air quality data were gathered from the Air Quality
The objectives of the study are to identify the poten- Division of the DOE, Malaysia. The data were collected
tial sources of variations in air quality and to implement and monitored by Alam Sekitar Malaysia Sdn. Bhd.
the integration of data-driven modelling in predicting (ASMA), the authorized agency for DOE. All stations
selected Malaysian Air Pollutant Index (API). were identified based on the availability of data starting
from January 1, 2005, to December 31, 2011. The air
quality variables used in this study are carbon monoxide
2 Materials and Methods (CO), ozone (O3), particulate matter under 10 m
(PM10), sulfur dioxide (SO2), nitrogen dioxide (NO2),
2.1 Study Area methane (CH4), non-methane hydrocarbon (NmHC),
and total hydrocarbon (THC). The measurements re-
Ten continuous air monitoring stations were selected. The corded for these variables are in the form of hourly
stations are Pasir Gudang (STA01), Kuching (STA02), reading. The equipment used by ASMA to monitor the
Bukit Rambai (STA03), Tasek (STA04), Nilai (STA05), air quality data are from Teledyne Technologies Inc.,

Table 1 Location and


coordinates (longitude and STA. ID Location Latitude Longitude
latitude) of air quality monitoring
stations in the study area STA01 Pasir Gudang, Johor N01 28.225 E103 53.637
STA02 Kuching, Sarawak N01 33.734 E110 23.329
STA03 Bukit Rambai, Melaka N02 15.510 E102 10.364
STA04 Tasek, Perak N04 37.781 E101 06.964
STA05 Nilai, Negeri Sembilan N02 49.246 E101 48.877
STA06 Klang, Selangor N03 00.620 E101 24.484
STA07 Balok Baru, Pahang N03 57.726 E103 22.955
STA08 Pengkalan Chepa, Kelantan N06 09.520 E102 17.262
STA09 Paka, Terengganu N04 35.880 E103 26.096
STA10 Labuan N05 19.980 E115 14.315
2063, Page 4 of 14 Water Air Soil Pollut (2014) 225:2063

Fig. 1 Location of the ten selected air quality monitoring stations in Malaysia

USA, and Met One Instruments Inc., USA. According the lowest detection of 0.04 ppm, while O3 was mea-
to the ASMA (2007) report, Standard Operating sured through the UV absorption (Beer-Lambert) meth-
Procedures for Continuous Air Quality Monitoring od with a detection limit of 0.4 ppb. The measurements
(2006), the analyzer used by ASMA to monitor PM10 of SO2, NOx, CO, and O3 were at a precision level of
was a BAM-1020 Beta Attenuation Mas Monitor from 0.5 %. THC, CH4, and NmHC, the analyzers used by
Met One Instruments Inc. USA. This instrument has a ASMA, were measured using a Teledyne API M4020
fairly high resolution of 0.1 g m3 at a 16.7-L min1 from Teledyne Technologies Inc., USA, which is
flow rate, with lower detection limits of <4.8 g m3 equipped with a flame-ionization detector (FID) and a
and <1.0 g m3 for 1 and 24 h, respectively. The measurement accuracy of 1 %. These instruments were
instruments used by ASMA to monitor SO2, CO, and used due to well-proven accuracy, reliability, and
O3 were the Teledyne API Model 100A/100E, Teledyne robustness.
API Model 200A/200E, Teledyne API Model 300/
300E, and Teledyne API Model 400/400E, respectively, 2.3 Data and Data Pre-treatment
from Teledyne Technologies Inc., USA. SO2 measure-
ment was based on the UV fluorescence method, where A total of 202,080 data points (8 variables25,260 data
the lowest level of detection is at 0.4 ppb. NOx measure- set) were utilized in this analysis. The total number of
ment used the chemiluminescence detection method missing data in the data points was very small (3 %)
with the same detection limit as the SO2 analyzer. CO compared to the overall data. In order to facilitate the
was measured using the non-dispersive, infrared absorp- data analysis, the nearest neighbor method was applied
tion (Beer-Lambert) method with 0.5 % precision and through the use of the XLSTAT 2014 add-in software
Water Air Soil Pollut (2014) 225:2063 Page 5 of 14, 2063

(Junninen et al. 2004; Dominick et al. 2012; Azid et al. distribution and prevent any classification errors that
2013). This method examines the distance between each may occur from groups described by variables of
point and the closest point to it. The nearest neighbor completely different sizes (Simeonov et al. 2002). Then,
method is the simplest scheme, where the end points of the data matrix was decomposed into scores or compo-
the gaps are used as estimates for all missing values nents and loadings (correlations between the original
(Junninen et al. 2004; Dominick et al. 2012). The equa- variables and the PCs extracted by the analysis) for the
tion applied in this method is shown in Eq. 1: variables.
Bartletts test of sphericity was performed at the
y y1 if x x1 x2 x1 =2 or
1 beginning of the PCA in order to examine the correla-
y y1 if x > x1 x2 x1 =2 tion of the variables used in the PCA (Tabachnick and
Fidell 2001). According to Bartlett (1950), this test was
where y is the interpolant, x is the time point of the able to estimate the probability that there were correla-
interpolant, y1 and x1 are the coordinates of the starting tions in a matrix. The null hypothesis, H0, of this test
point of the gap, and y2 and x2 are the endpoints of the gap. states that there is no correlation significantly different
from 0 between the variables. While the alternative
2.4 Environmetric Analysis and Modelling hypothesis, Ha, states that at least one of the correlations
between the variables is significantly different from 0.
2.4.1 Principal Component Analysis As the computed p value is lower than the significance
level alpha=0.05, one should reject the null hypothesis
The dimension of a huge data set can be trimmed down H0, and accept the alternative hypothesis Ha. The risk to
by using principal component analysis (PCA), which is reject the null hypothesis, H0, while it is true is lower
considered as one of the most prevalent and useful sta- than 0.01 %. When the null hypothesis, H0, result is
tistical methods for uncovering the potential structure of a rejected, then it is confirmed that the variables used in
set of variables. This method is used for explaining the the PCA are correlated.
variance of a large set of interrelated variables by The Kaiser-Meyer-Olkin (KMO) test was carried out
transforming them into a new, smaller set of uncorrelated in order to measure the sampling adequacy. These ma-
(independent) variables, namely principal components trices measure the sampling adequacy for each variable
(PCs) (Simeonov et al. 2003; Dominick et al. 2012; along the diagonal and the negatives of the partial
Mutalib et al. 2013). PCs are orthogonal and uncorrelated correlation/covariances on the off-diagonals. The diag-
to each other and have linear combinations of the original onal elements should be greater than 0.5 at a bare
variables (Abdul-Wahab et al. 2005; Viana et al. 2006; minimum if the sample is adequate for a given pair of
Skrbi et al. 2007; Juneng et al. 2009). PCA has the ability variables (Field 2000, 2005). If any pair of variables has
to show the most significant variables which can indicate a value which is less than 0.5, consider dropping one of
the source of the pollutants. This is because, in the them from the analysis.
analysis process, the variables that are less significant The PCs generated by PCA sometimes are not readily
are omitted from the data set with a minimal loss of interpreted and should be rotated using any of a number
original data (Singh et al. 2004, 2005; Shrestha and of applicable methods such as varimax rotation. The goal
Kazama 2007). The raw air quality variables were stan- of varimax rotation is to minimize the complexity of the
dardized through Z-scale transformation to a mean of 0.0 components by making the large loadings larger and the
and variance of 1.0 by applying Eq. 2: small loadings smaller within each component. The
 varimax rotation method was applied because this meth-
Z ij X ij = 2
od simplifies the factor structure and, therefore, makes its
where Zij is the jth value of the standard score of the interpretation easier and more reliable. In the varimax
measured variable i, Xij is the jth observation of variable i; rotation method, only the PCs with eigenvalues greater
is the variables mean value; and is the standard than 1 (>1) are used and considered significant (Kim and
deviation. The Z-scale transformation method was used Mueller 1987) in order to obtain the new variables,
to ensure that the different air quality variables had equal known as varifactors (VFs) or factor loadings. This ap-
weights in the statistical analysis process. Besides, these proach is known as the Kaiser Criterion. The Kaiser
transformations will homogenize the variance of the Criterion is used to solve the problem of the number of
2063, Page 6 of 14 Water Air Soil Pollut (2014) 225:2063

components to be retained (Kaiser 1958). The numbers of As with any other prediction models, the number and
VFs obtained by varimax rotations are equal to the num- selection of input parameters are very important in ANN
ber of variables in accordance with common features and modelling. The combination of the absolute principal
can include unobservable, hypothetical, and latent vari- component score obtained from PCA as an input
ables. The VFs are values that are used to measure the variable to the multilayer perceptron ANN has been
correlation between variables. VF values which are great- done previously by Bucinski et al. (2005), OFarrella
er than 0.75 (>0.75) are considered strong, the values et al. (2005), Sousa et al. (2007), Liu et al. (2007), Ravi
ranging from 0.50 to 0.75 (0.50factor loading0.75) and Pramodh (2008), and Zelkic-Susac et al. (2010, 2013)
are considered moderate, and the values ranging from and proved that this method is very well suited for deci-
0.30 to 0.49 (0.30factor loading0.49) are considered sion making and problem solving in different areas. In this
weak factor loadings (Liu et al. 2003). In this study, the study, rotated PCA data points were used instead of an
VFs with absolute values greater than 0.75 were set as the unrotated PC, as it provides intermediate diagnostic infor-
selection threshold. Then, the results of factor scores after mation about the groupings of the chronologies.
varimax rotation were used for artificial intelligence Two models were developed and compared, namely
modelling. The PCA was examined using XLSTAT MLP-FF-ANN model A (as a reference) and MLP-FF-
2014 add-in software. ANN model B. The eight variables were used as input
layers in MLP-FF-ANN model A. The result of factor
2.5 Artificial Neural Network Model scores obtained after varimax rotation in PCA was used
as input layers in MLP-FF-ANN model B. The numbers
The artificial neural network (ANN) is a branch of artifi- of hidden layers were based on the nature of the problem
cial intelligence, designed to imitate the biological brain under study and have been varied depending on the
system (Moustris et al. 2010). The ANN can learn com- predicting horizon. The output layers were the API
plex data patterns and can be applied to the activities of values. Figure 2 shows the network structure of MLP-
predicting, clustering, and classification (Hakimpoor FF-ANN model A and B. Trial-and-error procedure
et al. 2011). The ANN technique is able to provide better between one to ten hidden nodes in the network struc-
prediction, the results of which depend on the input ture was examined in order to approximate any non-
number of variables (Chaloulakou et al. 2003). linear function with any level of accuracy, and it was
In this study, the multilayered perceptron feed-forward used to search the best model for prediction values.
artificial neural network (MLP-FF-ANN) was applied and Based on theoretical studies, a network with a small
computed by the JMP 10 software, which offers flexibility number of nodes shall probably fail to learn the data,
and ease of application. This network type consists of a while too many nodes shall fatefully overfit the training
system of layered interconnected neurons or nodes, patterns in the network and give a poor generalization
which are arranged to form three layers, namely input performance result, especially when dealing with noisy
layers (independent variables), hidden layers (one or data in predicting problems (Chaloulakou et al. 2003).
more), and the output layers (dependent variables) In this analysis, about 202,080 data points were used in
(Abdul-Wahab and Al-Alawi 2002; Yetilmezsoy and MLP-FF-ANN model A, while 50,520 data points were
Demirel 2007), with nodes in each layer connected to used in MLP-FF-ANN model B.
each other (Hornik et al. 1989; Jain et al. 1996). According to Rech (2002), a data point should be
Two main phases were involved in this network, classified into three groups, namely, training, testing,
namely forward and backward phases. During the for- and validating sub-sets. The analysis of ANN undergoes
ward phase, the training data point is propagated three phases: training (60 %), testing (20 %), and vali-
through the hidden layer and the output value obtained dation (20 %) of the data (Xie et al. 2009). The training
was compared with the actual target values to calculate sub-set is used to estimate and learn the parameters
the error between them (Cigizoglu and Kisi 2006). patterns in the data point. The testing sub-set is used to
Then, the calculated error was propagated back towards evaluate the generalization ability of the supposedly
the hidden layer. The output weightings of the node are trained network. The validating sub-set is responsible
adjusted to reduce the errors with each interaction for performing the final check on the trained network.
resulting in improved models. This implies a direction To evaluate the results of the ANN models in this
of information processing (Sun et al. 2008). present study, two performance features were
Water Air Soil Pollut (2014) 225:2063 Page 7 of 14, 2063

Fig. 2 Example of a network


(i) MLP-FF-ANN Model A
structure of a multilayer
preceptron feed-forward
artificial neural network; (i)

Eight (8) variables of air pollutants as an


MLP-FF-ANN model A and (ii)
MLP-FF-ANN model B

API

input
Input Layer Hidden Layer Output

(ii) MLP-FF-ANN Model B


Result of factor scores after varimax rotation
as an input

API

Input Layer Hidden Layer Output

considered for the model evaluation, namely coefficient therefore confirming that the air quality variables were
of determination (R2) and root mean square error correlated and not orthogonal. This suggests that PCA
(RMSE). The model with the highest value of R2 and will allow for interpretation of the variability in the data
the lower value of RMSE is declared the best linear with less than the original number of variables (McNeil
model (Norusis 1990; Azid et al. 2013). Then, the et al. 2005). The Kaiser-Meyer-Olkin (KMO) test of
predicted values of the models were compared with each these data shows that the sampling adequacy was greater
other to obtain a parsimonious model (a model that than 0.5 (Table 2). This considered that all variables are
depends on as few variables as necessary). adequate and can apply for further analysis. The estima-
tion of the factor loadings was carried out for assessing
the correlations between air quality variables and the
3 Results and Discussion extracted factors.
After varimax rotation, from eight PCs, only two VFs
3.1 Air Pollutant Source Identification which represent 64.24 % of the variance of the data were
selected due to the eigenvalues greater than 1 (>1.0).
Bartletts test of sphericity revealed that the air quality Despite the cumulative variance being less than 70 %,
data met the sphericity assumption since it had an ob- the cutoff point of the factors was determined using a
served chi-square value of 1.85105 (p<0.05, df=28), scree plot graph (Fig. 3). The eigenvalues less than 1
2063, Page 8 of 14 Water Air Soil Pollut (2014) 225:2063

Table 2 Kaiser-Meyer- of these three air quality variables, this factor is mostly
Olkin (KMO) measure of Pollutant Result
related to the processes of petrochemical produc-
sampling adequacy
CO 0.637 tion from petrochemical industries and the fuel
O3 0.578 combustion from transportation activities (Koppmann
PM10 0.765 2007). Besides, it is also related to the process of bio-
SO2 0.446 mass burning and grazing and the residual of agricultur-
NO2 0.681 al products from agricultural activities (Haiduc and
CH4 0.481 Beldean-Gale 2011).
NMHC 0.442 The VF2 demonstrates 31.91 % of the variance in the
THC 0.486 data. It exhibits high loading from O3 (0.75) and PM10
KMO 0.515 (0.84). The concentration of these pollutants is related to
the secondary pollutant (O3) and non-gas pollutant
(PM10). O3 is released into the atmosphere as a result
of photochemical oxidation and is the main component
(<1.0) are neglected because of being redundant with of smog (Banan et al. 2013). The concentration of O3,
more important factors. It means multicollinearity was especially in urban and suburban areas, is contributed by
present among original variables. the mono-nitrogen oxide (NOx) (Sadanaga et al. 2012)
In this study, the VFs with absolute values greater and the downwind plume of O3 precursors from the
than 0.75 were set as the selection threshold because industrial activities (Wei et al. 2012; Monteiro et al.
these values are solid and stable, which exhibit moderate 2012). PM10 is the main component of dust fall, which
to strong loadings on the extracted factors. Table 3 and comes from industrial activities and construction
Fig. 4 highlight that five out of the eight air quality sites (Pandey et al. 2008), the transportation ex-
variables satisfy the 0.75 factor loading threshold. haust emission and soil dust (Arsene et al. 2007),
These variables are CH4, NmHC, THC, O3, and PM10. and also open burning activity around the study
These pollutants are then classified as the major contrib- area. According to the Malaysian Ministry of Transport,
uting pollutants at the selected monitoring stations in MOT (2010), the total amount of new registered motor
Malaysia. The VF1 contributes about 32.33 % of the vehicles in Malaysia increased by 4.42 % from 934,367
variation in the air quality data. It has high loadings from in 2004 to 1,160,082 in 2010. Based on this
three variables, which are CH4 (0.949), NmHC (0.79), information, motor vehicles in Malaysia are one of the
and THC (0.99). This factor can be interpreted as a major factors that contribute to the deterioration of at-
measure of gaseous pollutants. Considering the nature mospheric conditions.

Fig. 3 Scree plots for PCA


3 100

2.5
80

2
60

1.5

40
1

20
0.5

0 0
F1 F2 F3 F4 F5 F6 F7 F8
Water Air Soil Pollut (2014) 225:2063 Page 9 of 14, 2063

3.2 Prediction of the Air Pollution Index Variables (axes D1 and D2: 64.24 %)
Using Multilayered Preceptor Feed-Forward Artificial after Varimax rotation
1
Neural Network
PM10
0.75 O3
NO2 CO
In the rotated PCA, two VFs with eigenvalues greater
0.5
than 1 (>1) were retained and used as an input variable

D2 (31.91 %)
in the MLP-FF-ANN models. These two VFs represent- 0.25 NMHC
THC
ed the air pollutants CH4, NmHC, THC, O3, and PM10. 0
SO2 CH4
Based on factor score coefficients, the factor scores after
-0.25
varimax rotation are then used in MLP-FF-ANN model
B. In the development of MLP-FF-ANN models, 20 -0.5
network structures were tested. The summary of the
-0.75
results for R2 and RMSE of both of the models is
represented in Table 4. Figure 5a, b shows the scatter -1

plot between actual and predicted API for both models -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

using rotated PCA scores and original raw data as input D1 (32.33 %)
variable data. Seven to eight hidden nodes are consid-
Fig. 4 Factor loading plot after varimax rotation
ered optimum as a further increase as the complexity of
the topology resulted in no significant improvement
over the prediction performance, as given in Fig. 6a, b. model A. Enabling a significant reduction of input data
Using the eight variables as input in MLP-FF-ANN using rotated PCA scores, it shows a predictive power at
model A, the R2 and RMSE values are 0.615 and least as good as the one given by the standard model and
10.026, respectively. By considering VF scores as input it can be concluded that this technique may be seen as a
in MLP-FF-ANN model B, the R2 and RMSE values are
0.618 and 10.017, respectively. It can be observed that Table 4 The prediction performance of R2 and RMSE for MLP-
FF-ANN models A and B using the ANN technique
the prediction performance of the ANN model using
original principal component scores is not significantly Model Hidden node R2 RMSE
different from the rotated PCA models. However, as
rotated PCA models use fewer variables and are far less MLP-FF-ANN model A 1 0.465 11.822
complex, the advantage over MLP-FF-ANN model B is 2 0.509 11.321
more obvious than that of MLP-FF-ANN model A. 3 0.543 10.927
Comparatively, the prediction performance in MLP- 4 0.559 10.723
FF-ANN model B is better than that of MLP-FF-ANN 5 0.591 10.335
6 0.595 10.283
7 0.615 10.026
Table 3 Varifactors af- 8 0.612 10.059
ter varimax rotation and Variable VF1 VF2
the possible source 9 0.575 10.685
category in the study area CO 0.22 0.69 10 0.568 10.756
O3 0.15 0.75 MLP-FF-ANN model B 1 0.450 12.148
PM10 0.12 0.84 2 0.462 11.833
SO2 0.01 0.01 3 0.517 11.386
NO2 0.06 0.66 4 0.538 11.132
CH4 0.95 0.02 5 0.575 10.685
NmHC 0.79 0.26 6 0.587 10.528
THC 0.99 0.06 7 0.606 10.277
Eigenvalue 3.12 2.02 8 0.618 10.017
Variability (%) 32.33 31.91 9 0.597 10.396
Cumulative (%) 32.33 64.24 10 0.587 10.534
2063, Page 10 of 14 Water Air Soil Pollut (2014) 225:2063

(a) (b)
Fig. 5 Scatter plot diagram of the API prediction performance (actual by predicted plot): a original raw data (eight variables) and b rotated
PCA scores

good alternative for establishing ANN models in air that the air quality variables in the study area were
quality prediction compared to others. This application correlated and not orthogonal. The Kaiser-Meyer-Olkin
not only saves time, but it also can save the cost (KMO) test shows the sampling adequacy was greater
of monitoring purposes. Therefore, it proved that than 0.5 and considered that all variables are adequate
the feed-forward ANN architecture is able to predict and can apply for further analysis. The two VFs generat-
API values from all available inputs with negligible ed by rotated PCA indicate that only five (CH4, NmHC,
precision. THC, O3, and PM10) out of eight parameters were sig-
nificant and responsible for air quality variations in the
study area. Based on the Malaysian Ministry of Transport
4 Conclusion data, it is believed that motor vehicles are one of the
major factors that contribute to the formation of these
Air quality monitoring programs have generated a huge, pollutants. Thus, this study indicated that for the future
multidimensional and complex data set, which requires and effective management of the Malaysian air quality,
environmetric techniques for data analysis and interpre- the effort of controlling point and non-point pollution
tation of the underlying information. In this study, we sources should be prioritized.
applied PCA to identify the pollution sources for air The results of factor scores after varimax rotation
quality variation and to determine the optimum number were used as input variables in MLP-FF-ANN model
of sampling parameters in a certain area in Malaysia even B for API prediction with the R2 value of 0.618, which is
without a field visit. Bartletts test of sphericity revealed better than the R2 value (0.615) in MLP-FF-ANN model

0.7 14
R Square Performance

RMSE Performance

0.6 12
0.5 10
0.4 8
0.3 Model A 6
Model A
0.2 4
Model B Model B
0.1 2
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
No. of hidden nodes No. of hidden nodes
(a) (b)
Fig. 6 ab The performance of R2 and RMSE for MLP-FF-ANN models A and B
Water Air Soil Pollut (2014) 225:2063 Page 11 of 14, 2063

A. The reduction of the amount of input data in model B Azid, A., Juahir, H., Latif, M. T., Zain, S. M., & Osman, M. R.
(2013). Feed-forward artificial neural network model for air
enables faster predictive outcome without changing sig-
pollutant index prediction in the southern region of
nificantly the predictive power as compared to MLP-FF- Peninsular Malaysia. Journal of Environmental Protection,
ANN model A. It means that the factor scores after 4(12A), 110. doi:10.4236/jep.2013.412A001.
varimax rotation are more efficient and effective due to Azmi, S. Z., Latif, M. T., Ismail, A. S., Juneng, L., & Jemain, A. A.
(2010). Trend and status of air quality at three different
reduction of predictor variables without losing impor-
monitoring stations in the Klang Valley, Malaysia. Air
tant information. Additionally, the combination of factor Quality, Atmosphere and Health, 3, 5364. doi:10.1007/
scores after varimax rotation and ANN has been s11869-009-0051-1.
proven as useful tools in air pollution modelling. Banan, N., Latif, M. T., Juneng, L., & Ahamad, F. (2013).
Characteristics of surface ozone concentrations at stations
Moreover, this technique can be a simple alternative
with different backgrounds in the Malaysian Peninsula.
model to provide reliable estimates of API by using only Aerosol and Air Quality Research, 13, 10901106. doi:10.
limited information. 4209/aaqr.2012.09.0259.
Bartlett, M. S. (1950). Tests of significance in factor analysis.
British Journal of Psychology, 3, 7785.
Acknowledgments The authors acknowledge the Air Quality Bishoi, B., Prakash, A., & Jain, V. K. (2009). A comparative study
Division of the Department of Environment (DOE) under the of air quality index based on factor analysis and US EPA
Ministry of Natural Resource and Environment, Malaysia, for methods for an urban environment. Aerosol & Air Quality
giving us permission to utilize air quality data, advice, guidance, Research, 9, 117.
and support for this study. Brauer, M., & Jamal, H. H. (1998). Fires in Indonesia: crisis and
reaction. Environmental Science and Technology, 32(17),
404A407A.
Brunelli, U., Piazza, U., & Pignato, L. (2007). Two-day ahead
References prediction of daily maximum concentrations of SO2, O3,
PM10, NO2, CO in the urban area of Palermo, Italy.
Atmospheric Environment, 41(14), 29672995. doi:10.
Abdul-Wahab, S. A., & Al-Alawi, S. M. (2002). Assessment and 1016/j.atmosenv.2006.12.013.
prediction of tropospheric ozone concentration levels using Bucinski, A., Baczek, T., Wasniewski, T., & Stefanowicz,
artificial neural networks. Environmental Modelling and M. (2005). Clinical data analysis with the use of arti-
Software, 17, 219228. ficial neural networks (ANN) and principal component
Abdul-Wahab, S. A., Bakheit, C. S., & Al-Alawi, S. M. (2005). analysis (PCA) of patients with endometrial carcinoma.
Principal component and multiple regression analysis in Reports on Practical Oncology & Radiotherapy, 10,
modelling of ground-level ozone and factors affecting its 239248.
concentrations. Environmental Modelling and Software, 20, Chaloulakou, A., Grivas, G., & Spyrellis, N. (2003). Neural net-
12631271. work and multiple regression model for PM10 prediction in
Afroz, R., Hassan, M. N., & Ibrahim, N. A. (2003). Review of air Athens: a comparative assessment. Journal Air & Waste
pollution and health impacts in Malaysia. Environmental Management Association, 53, 11831190. doi:10.1080/
Research, 92, 7177. doi:10.1016/S0013-9351(02)00059-2. 10473289.2003.10466276.
Afzali, M., Afzali, A., & Zahedi, G. (2012). The potential of Cigizoglu, H. K., & Kisi, O. (2006). Methods to improve the
artificial neural network technique in daily and monthly neural network performance in suspended sediment estima-
ambient air temperature prediction. International Journal of tion. Journal of Hydrology, 317(3), 221238.
Environmental Science Development, 3(1), 3338. Department of Environment (DOE) (2014). Main sources of air
Alam Sekitar Malaysia Sdn Bhd (ASMA). (2007). Standard op- pollution in Malaysia. http://apims.doe.gov.my/apims/
erating procedure for continuous air quality monitoring. General%20Info%20of%20Air%20Pollutant%20Index.pdf.
Selangor Malaysia: Shah Alam. Accessed 20 March 2014
Alkasassbeh, M., Sheta, A. F., Faris, H., & Turabieh, H. (2013). Department of Environment Malaysia (DOE). (2005). Malaysia
Prediction of PM10 and TSP air pollution parameters using environmental quality report, 2004. Kuala Lumpur: Ministry
artificial neural network autoregressive, external input of Science, Technology and Environment.
models: a case study in salt, Jordan. Middle East Journal of Department of Environment Malaysia (DOE). (2010). Malaysia
Scientific Research, 14(7), 9991009. doi:10.5829/idosi. environmental quality report, 2009. Kuala Lumpur: Ministry
mejsr.2013.14.7.2171. of Science, Technology and Environment.
Arsene, C., Olariu, R. I., & Mihalopoulos, N. (2007). Chemical Dominick, D., Juahir, H., Latif, M. T., Zain, S. M., & Aris, A. Z.
composition of rainwater in the Northeastern Romania, Iasi (2012). Spatial assessment of air quality patterns in Malaysia
Region (20032006). Atmospheric Environment, 41, 9452 using multivariate analysis. Atmospheric Environment, 60,
9467. 172181. doi:10.1016/j.atmosenv.2012.06.021.
Awang, M. B., Jaafar, A. B., Abdullah, A. M., Ismail, M. B., Dragomir, E. G. (2010). Air quality index prediction using K-
Hassan, M. N., Abdullah, R., Johan, S., & Noor, H. (2000). nearest neighbor technique. Bulletin of PG University of
Air quality in Malaysia: impacts, management issue and Ploiesti, Series Mathematics, Informatics, Physics,
future challenges. Respirology, 5, 183196. LXII(1/2010), 103108.
2063, Page 12 of 14 Water Air Soil Pollut (2014) 225:2063

Field, A. P. (2000). Discovering statistics using SPSS for Latif, M. T., Hey, L. S., & Juneng, L. (2012). Variations of surface
Windows. London-Thous and Oaks-New Delhi: Sage ozone concentration across the Klang Valley, Malaysia.
Publication. Atmospheric Environment, 61, 434445. doi:10.1016/j.
Field, A. P. (2005). Discovering statistics using SPSS (2nd ed.). atmosenv.2012.07.062.
London: Sage. Lau, J., Hung, W. T., & Cheung, C. S. (2009). Interpretation of air
Garca, I., Rodrquez, J. G., & Tenorio, Y. M. (2011). Artificial quality in relation to monitoring stations surroundings.
neural network models for prediction of ozone concen- Atmospheric Environment, 43, 769777.
trations in Guadalajara, Mexico. Air quality-models Liu, C. W., Lin, K. H., & Kuo, Y. M. (2003). Application of factor
and application. Nicolas Mazzeo (Ed.), ISBN:978 953-307- analysis in the assessment of groundwater quality in a black-
307-1, (pp. 35-52). foot disease area in Taiwan. Science of the Total
Giorgio, F., & Piero, M. (1996). Mathematical models for planning Environment, 313, 7789. doi:10.1016/S0048 9697(02)
and controlling air quality. Proceedings of IIASA Workshop, 00683-6.
17. Liu, G., Yi, Z., & Yang, S. (2007). A hierarchical intrusion detec-
Haiduc, I., & Beldean-Gale, M. S. (2011). Variation of greenhouse tion model based on the PCA neural networks.
gases in urban areas-case study: CO2, CO and CH4 in three Neurocomputing, 70, 15611568.
Romanian cities, air quality-models and applications. Prof. Mahboubeh, A., Afsaneh, A., & Gholamreza, Z. (2012). The
Dragana Popovic (Ed.), ISBN: 978953-307-307-1, InTech. potential of artificial neural network technique in daily and
http://www.intechopen.com/books/air-qualitymodelsand monthly ambient air temperature prediction. International
applications/variationofgreenhouse-gases-in-urban areas- Journal of Environmental Science and Development, 3(1),
case-study-co2-co-and-ch4-in three-romanian-cities. 3338.
Accessed 2 May 2013. Mahiyudin, W. R. W., Sahani, M., Aripin, R., Latif, M. T., Thach,
Hakimpoor, H., Arshad, K. A., Tat, H. H., Khani, N., & T. Q., & Wong, C. M. (2013). Short-term effects of daily air
Rahmandoust, M. (2011). Artificial neural networks appli- pollution on mortality. Atmospheric Environment, 65, 6979.
cation in management. World Applied Sciences Journal, doi:10.1016/j.atmosenv.2012.10.019.
14(7), 10081019. Maier, H. R., & Dandy, G. C. (2001). Neural network based
Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer modelling of environmental variables: a systematic approach.
feed-forward networks are universal approximators. Neural Mathematical and Computer Modelling, 33, 669682.
Networks, 2, 359366. Mazrura, S., Wan Mahiyuddin, W. R., Aripin, R., Mohd Latif, T.,
Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial neural Thach, T. Q., & Wong, C.-M. (2011). Health risk of air pollu-
networks: a tutorial. Computer, 29(3), 3144. tion on mortality in Klang Valley, Malaysia. Epidemiology,
Jamal, H. H., Pillay, M. S., Zailina, H., Shamsul, B. S., Sinha, K., 22(1), 159.
Huri, Z. Z., Khew, S. L., Mazrura, S., Ambu, S., Rahimah, McNeil, V. H., Cox, M. E., & Preda, M. (2005). Assessment of
A., & Ruzita, M. S. (2004). A study of health impact & risk chemical water types and their spatial variation using multi-
assessment of urban air pollution in Klang Valley. Malaysia, stage cluster analysis, Queensland, Australia. Journal of
Kuala Lumpur: UKM Pakarunding Sdn Bhd. Hydrology, 310(14), 181200.
Juneng, L., Latif, M. T., Tangang, F. T., & Mansor, H. (2009). Monteiro, A., Strunk, A., Carvalho, A., Tchepel, O., Miranda, A.
Spatiotemporal characteristics of PM10 concentration across I., Borrego, C., Saavedra, S., Rodrguez, A., Souto, J.,
Malaysia. Atmospheric Environment, 45, 43704378. Casares, J., Friese, E., & Elbern, H. (2012). Investigating a
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., & high ozone episode in a rural mountain site. Environmental
Kolehmainen, M. (2004). Methods for imputation of missing Pollution, 162, 176189.
values in air quality data set. Atmospheric Environment, 38, Moustris, K. P., Ziomas, I. C., & Paliatsos, A. G. (2010). 3-day-
28952907. doi:10.1016/j.atmosenv.2004.02.026. ahead forecasting of regional pollution index for the pollut-
Kaiser, H. F. (1958). The varimax criterion for analytical rotation ants NO2, CO, SO2, and O3 using artificial neural networks in
in factor analysis. Psychometrika, 23(3), 187200. doi:10. Athens, Greece. Water, Air, & Soil Pollution, 209(1), 2943.
1007/BF02289233. doi:10.1007/s11270-009-0179-5.
Kamal, M. M., Jailani, R., & Shauri, R. L. A. (2006). Prediction of Mutalib, S. N. S. A., Juahir, H., Azid, A., Sharif, S. M., Latif, M.
ambient air quality based on neural network technique. 4th T., Aris, A. Z., Zain, S. M., & Dominick, D. (2013). Spatial
Student Conference on Research and Development, 115-119, and temporal air quality pattern recognition using
doi:10.1109/SCORED.2006.4339321. environmetric techniques: a case study in Malaysia.
Kim, J. O., & Mueller, C. W. (1987). Introduction to factor Environmental Science, Processes & Impacts 1-12, doi: 10.
analysis: what it is and how to do it. Quantitative applica- 1039/c3em00161j.
tions in the social science series. Newbury Park: Sage Niska, H., Hiltunen, T., Karppinen, A., Ruu Skanen, J., & Koleh
University Press. Mainen, T. (2004). Evolving the neural network model for
Koppmann, R. (2007). Volatile organic compounds in the atmo- forecasting air pollution time series. Engineering
sphere. Singapore: Blackwell Publishing Ltd. Applications of Artificial Intelligence, 17, 159.
Kurt, A., Gulbagci, B., Karaca, F., & Alagha, O. (2008). An online Niska, H., Rantamaki, M., Hiltunen, T., Karppinen, A., Hukko Nen,
air pollution forecasting system using neural networks. J., Ruu Skanen, J., & Koleh-Mainen, M. (2005). Evaluation of
Environment International, 34(5), 592598. doi:10.1016/j. an integrated modelling system containing a multi-layer
envint.2007.12.020. perceptron model and the numerical weather prediction model
Water Air Soil Pollut (2014) 225:2063 Page 13 of 14, 2063

HURLEM for the forecasting of urban airborne pollutant monitoring data. Analytical and Bioanalytical Chemistry,
concentrations. Atmospheric Environment, 39, 6524. 374, 898905.
Norusis, M. J. (1990). SPSS base system users guide. Chicago, IL, Simeonov, V., Stratis, J. A., Samara, C., Zachariadis, G., Voutsa,
USA: SPSS. D., Anthemidis, A., Sofoniou, M., & Kouimtzis, T. (2003).
OFarrella, M., Lewisa, E., Flanagana, C., Lyonsa, W. B., & Assessment of the surface water quality in Northern Greece.
Jackman, N. (2005). Combining principal component analy- Water Research, 37, 41194224.
sis with an artificial neural network to perform online quality Singh, K. P., Malik, A., Mohan, D., & Sinha, S. (2004).
assessment of food as it cooks in a large scale industrial oven. Multivariate statistical techniques for the evaluation of spatial
Sensors and Actuators B, 107, 104112. and temporal variations in water quality of Gomti River
Othman, N., Mat Jafri, M. Z., & San, L. H. (2010). Estimating (India)a case study. Water Research, 38, 39803992. doi:
particulate matter concentration over arid region using satel- 10.1016/j.watres.2004.06,011.
lite remote sensing: a case study in Makkah, Saudi Arabia. Singh, K. P., Malik, A., & Sinha, S. (2005). Water quality assess-
Modern Applied Science, 4(11), 131142. ment and apportionment of pollution sources of Gomti River
Pandey, S., Tripathi, B., & Mishra, V. (2008). Dust deposition in a (India) using multivariate statistical techniquesa case
sub-tropical opencast coalmine area, India. Journal of study. Analytica Chimica Acta, 538, 355374. doi:10.1016/
Environmental Management, 86, 132138. j.aca.2005.02.006.
Perez, P., & Reyes, J. (2006). An integrated neural network model Skrbi, C. B., Duri, S. I., & C-Mladenovi, C. N. (2007). Principal
for PM10 forecasting. Atmospheric Environment, 40, 2845 component analysis for soil contamination with organochlo-
2851. doi:10.1016/j.atmosenv.2006.01.010. rine compounds. Chemosphere, 68, 21442152.
Pires, J. C. M., Sousa, S. I. V., Pereira, M. C., Alvim-Ferraz, M. C. Sohn, S. H., Oh, S. C., & Yeo, Y. K. (1999). Prediction of air
M., & Martins, F. G. (2008a). Management of air quality pollutants by using an artificial neural network. Korean
monitoring using principal component and cluster. Analysis Journal of Chemical Engineering, 16(3), 382387. doi:10.
part I: SO2 and PM10. Atmospheric Environment, 42, 1249 1007/BF02707129.
1260. Sousa, S. I. V., Martins, F. G., Alvim-Ferraz, M. C. M., & Pereira, M.
Pires, J. C. M., Sousa, S. I. V., Pereira, M. C., Alvim-Ferraz, M. C. C. (2007). Multiple linear regression and artificial neural net-
M., & Martins, F. G. (2008b). Management of air quality works based on principal components to predict ozone concen-
monitoring using principal component and cluster. trations. Environmental Modelling & Software, 22, 97103.
Analysispart II: CO, NO 2 and O 3 . Atmospheric Sun, G., Hoff, S. J., Zelle, B. C., & Nelson, M. A. (2008).
Environment, 42, 12611274. Development and comparison of backpropagation and gen-
Pires, J. C. M., Sousa, S. I. V., Pereira, M. C., Alvim-Ferraz, M. C. eralized regression neural network models to predict diurnal
M., & Martins, F. G. (2009). Identification of redundant air and seasonal gas and PM10 concentrations and emissions
quality measurements through the use of principal compo- from swine buildings. American Society of Agricultural and
nent analysis. Atmospheric Environment, 43, 38373842. Biological Engineers, 51(2), 685694.
Rahman, N. H. A., Lee, M. H., Latif, M. T., & Suhartono. (2013). Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate
Forecasting of air pollution index with artificial neural net- statistics (4th ed.). New York: Harper & Row.
work. Jurnal Teknologi, 63(2), 5964. Tecer, L. H. (2007). Prediction of SO2 and PM concentrations in a
Ravi, V., & Pramodh, C. (2008). Threshold accepting trained coastal mining area (Zonguldak, Turkey) using an artificial
principal component neural network and feature subset se- neural network. Polish Journal of Environmental Studies,
lection: application to bankruptcy prediction in banks. 16(4), 633638.
Applied Soft Computing, 8, 15391548. Viana, M., Querol, X., Alastuey, A., Gil, J. I., & Menndez, M.
Rech, G. (2002). Forecasting with artificial neural network (2006). Identification of PM sources by principal component
models. SSE/EFI working paper series in economics and analysis (PCA) coupled with wind direction data.
finance 491, Stockholm School of Economics. Chemosphere, 65, 24112418.
Sadanaga, Y., Sengen, M., Takenaka, N., & Bandow, H. (2012). Wei, X., Liu, Q., Lam, K. S., & Wang, T. (2012). Impact of
Analyses of the ozone weekend effect in Tokyo, Japan: precursor levels and global warming on peak ozone concen-
regime of oxidant (O3 +NO2) production. Aerosol & Air tration in the Pearl River Delta Region of China. Advances in
Quality Research, 12, 161168. Atmospheric Sciences, 29, 635645.
Seinfeld, J. H., & Pandis, S. N. (1998). Atmospheric chemistry and Wei-Zhen, L., Hong-Di, H., & Li-Yun, D. (2011). Performance
physics. United States: John Wiley & Sons Inc. assessment of air quality monitoring networks using principal
Shrestha, S., & Kazama, F. (2007). Assessment of surface water component analysis and cluster analysis. Building and
quality using multivariate statistical techniques: a case study Environment, 46, 577583.
of the Fuji river basin, Japan. Environmental Modelling & Xie, H., Ma, F. & Bai, Q. (2009). Prediction of indoor air quality
Software, 22, 464475. doi:10.1016/j.envsoft.2006.02.001. using artificial neural networks. 2009 Fifth International
Siew, L. Y., Chin, L. Y., & Wee, P. M. J. (2008). ARIMA and Conference on Natural Computation, 412418.
integrated ARFIMA models for forecasting air pollution Yahaya, N., Ali, A., & Ishak, F. (2006) Air pollution index (API)
index in Shah Alam, Selangor. Malaysian Journal of and the effects on human health: case study in Terengganu
Analytical Sciences, 12(1), 257263. City, Terengganu, Malaysia. Paper submitted to the
Simeonov, V., Einax, J. W., Stanimirova, I., & Kraft, J. (2002). International Association for People Environmental Studies
Environmetric modeling and interpretation of river water (IAPS) Conference, September 2006, Alexandria, Egypt.
2063, Page 14 of 14 Water Air Soil Pollut (2014) 225:2063

Yetilmezsoy, K., & Demirel, S. (2007). Artificial neural network Zelkic-Susac, M., Sarlija, N., & Pfeifer, S. (2013). Combining
(ANN) approach for modeling of Cd(II) adsorption from PCA analysis and artificial neural networks in modelling
aqueous solution by Antep Pistachio (Pistacia Vera L.) shells. entrepreneurial intentions of students. Crotian Operational
Journal of Hazardous Materials, 153, 12881300. Research Review, 4, 306317.
Zekic-Susac, M., Pfeifer, S., & Djurdjevic, I. (2010). Classification Zhang, G., Eddy Patuwo, B., & Hu, M. Y. (1998). Forecasting
of entrepreneurial intentions by neural networks, decision with artificial neural networks: the state of the art.
trees and support vector machines. Croatian Operational International Journal of Forecasting, 14, 3562. doi:10.
Research Review, 1, 6273. 1016/S0169-2070(97)00044-7.

Vous aimerez peut-être aussi