Air quality models form one of the most Vehicular pollution model Training data, important components of an urban air quality Supervised learning management plan. An effective air quality Introduction management. System must be able to provide the Artificial Neural Networks (ANNs) authorities will information on the current and began with the pioneering work of McCulloch & likely like future trends, enabling them to make Pitts and has its root in rich interdisciplinary necessary assessments regarding the extent and history from the early 1940s. Hebbs2 proposed a type or the air pollution control management learning scheme for updating synaptic strength strategies to be implemented throughout the area. between neurons. His famous 'postulate of Various statistical modeling techniques learning', which is referred to as 'Hebbian (regression, multiple regression and lime series learning' rule, stated that the information can be analysis) have been used to predict air pollution stored in synaptic connections and the strength concentrations in the urban environment. These of a synapse would increase by the repeated models calculate pollution concentrations due to activation of neurons by the other ones across observed traffic, meteorological and pollution that synapse. data after an appropriate relationship has been Rosenhlatt and Block et al proposed a obtained empirically between these parameters. neuron like element called 'perceptron' and Recently, statistical modeling tool such as provided learning procedure for them. They artificial neural network (ANN) is increasingly further. Developed 'perceptron convergence used as an alternative tool for modeling the procedure which is advancement over the 'Hebb pollutants from vehicular traffic particularly in rule' for changing synaptic connection. Minsky urban areas. In the present paper, a review of the & Peppert demonstrated limitations of the single applications of ANN in vehicular pollution layer erceptron. Nileson showed that the modeling under urban condition and basic Multilayer Perceptrons (MLP) can be used to features of ANN and modeling philosophy, separate pattern nonlinearly in a hyperspace and including performance evaluation’ criteria for the perception convergence theorem applies only ANN based vehicular emission models have in single layer perception. Ru,elhlrt et al been described. presented the conceptual basis of back - propagation, which can be considered as a giant only then it generates the output. signal, which is slop forward as compared to its predecessor, transmitted through axon to junction referred to perceptorn. Flood & Kartem reviewed various as 'synapse'. The signals, as they pass through applications in civil engineering. Dougherty '“ the network, create different levels of activation reviewed application of ANN in transportation in the neurons. Amount of signals transferred engineering. Godbole' ‘ reviewed applications of depends upon the synaptic strength of junction. ANN in wind engineering and reported two Synaptic strength is modified during learning separate studies in which ANN technique processes of the brain. Therefore, it can be simulated experimental results obtained during considered as a memory unit of each inter- wind tunnel studies to determine wind connection. Identification or recognition depends pressuredistribution in low buildings. Gardner & on the activation levels o neurons Dorlingl" reviewed applications of ANN in Multilayer Perceptron atmospheric sciences and concluded that the Multilayer ANN or MLP (Fig. 2) neural networks generally give as good or better consists of a system of layered interconnected results than linear methods. 'neurons' or "nodes', which are arranged to form Basic Features of ANN three layers: an 'input' layer, one or more Biological Neuron 'hidden' layers, and an 'output' layer, with nodes A human brain consists of a large in each layer connected to other nodes in number of computing elements (10") called neighboring layers. The output of a node is 'neurons' which are the fundamental units of the scaled by connecting weights and led forward to biological nervous system (f'ig. 1). It is a simple as input to the nodes in next layer of the network 'processing unit, which receives and processes implying a direction of information processing. the signal from other neurons through its path Hence, MLP is also known as a feed-forward called 'dendrites'. neural network. Input layer passes input vector to the network. MLP ' can approximate any smooth measurable action between the input arid output vectors'. An MLP has the ability to learn through training, which requires, a set of training data consisting of a series of input and associated output vectors. Transfer Function The transfer function is the mechanism of translating input signals to output signals for each processing element's The three main types of transfer or activation functions are linear, An activity of neuron is an all - or - threshold and sigmoid functions. Linear function none process i.e., it is a binary number type of produces an output signal in the node that is process. If the combined signal is strong enough, directly proportional to the activity level of node. the target output. In unsupervised learning Threshold function produces an output signal (similar to learning by students without that is constant until a certain level is reached teacher), the desired response is not known, thus and then output changes to a new constant level explicit error information cannot be used to until the level of activity drops back below the improve network behavior. Thus, supervised threshold. Sigmoid function produces an output learning involves providing samples of the input that varies continually with the activity level but patterns that the network needs to learn and also it is not a linear variation. Sigmoid transfer the output patterns that the network needs to function has a graph similar to stretched f and produce when it encounters those patterns. consists of two functions - Hyperbolic (values Unsupervised learning involves providing «the between -I to +1); and logistic function (values network with input patterns. but not the desired between and – 1 to + 1); Superposition of f output patterns. various iron - linear transfer functions enables MLPs learn in supervised manner21.22 MLP to approximate behavior of non - linear During training of MLP, output corresponding to functions.. given input vector may not reach the desired Training a MLP - the Back - Propagation level. During training, magnitude of the error Algorithm signal, defined as life difference between the A learning' process or 'training' desired and actual output, is used lo determine to forms interconnections (correlations) between what degree the weights in the network should neurons. Learning algorithm in a neural network be adjusted so that the overall error can be specifics the rules or procedures that tell a node reduced to the desired (acceptable) level. in ANN as to how a lo modify its weight Hack-propagation is a supervised learning model distribution in response to input pattern. and is tile most widely used learning method During the training, MLP is repeatedly in employed by training of MLP"'. Generally, presented with training data, and weights in back-propagation learning algorithm applies in network .in are adjusted until desired two basic steps: (i) Feed-forward calculation; input-output mapping occurs ANN can be and (ii) Error back - propagation calculation. trained by 'supervised or 'unsupervised learning. The inputs are fed input the input layer and get multiplied by interconnection weights as they are passed from the input layer to the first hidden layer. Within the first hidden layer, they get summed and then processed by A nonlinear function. As the processed data In supervised learning, external leaves the First hidden layer, again it gets prototypes are used as target output for specific multiplied by interconnection weights, then inputs and network .is given a learning summed and processed by the second hidden algorithm lo follow and calculate new layer. Finally, the data is multiplied by connection 'weights' that bring output closer to interconnection weights and then processed one (destination); li = input value of neuron i; Ψj = last lime within the output layer lo produce bias values for the j"' hidden layer neuron. ANN output. With each presentation, output to' The output of a hidden unit H, as ANN is compared to the desired output and an unction or its net input is given by where, g is a error, measured as mean square- error , is sigmoid function computed. Tins error is then fed hack (back propagated) to ANN and used to adjust the weights such that error decreases with each Net input (NETpk) to the each output iteration, and neural model gets closer and closer layer unit and the output (Opk) from each output to producing the desired output (Fig. 3). The layer unit are calculated in analogous manner as training cycle is continued until the network described above by the following equations: achieves desired level of tolerance". Properly trained back-propagation networks tend to give reasonable answers when presented with inputs that the network has never seen before. This Where, Wkj = weights from node j to k way, it is used as an effective prediction model and k = 1,2 ...... and, Ψ, = bias values for kth for a variety of problems including air output layer neuron. These set of calculations pollution - related phenomena. provide the output state o the network and Feed forward computation: carried out in the same way of training as well Input vector (representing patterns testing phase. The test mode jus involves to he recognized) is incident on the input presenting input units and calculating this layer and distributed lo Subsequent hidden resulting output state in a single forward pass. layers and Finally to output layer via weighted Error Pack-Propagation connections (Pig. 4). Each neuron in the network A measure of classes of network to an operates by taking sum of its weighted input and established desired value is network error. Since passing the result through a non- network deal with supervised training, desired value is known of the given training set. For back-propagation learning; algorithm, an error measure known as the mea' square error is used, which is defined as:
linear activation function. The net input to
hidden unit is described as: Where Tpj = target (desired) value of jth output unit for pattern p; Opj = actual output obtained from jth output unit for pattern p. where, i = 1, 2.....'....n &.J- 1,2 .........H, Wji = Standard back - propagation uses weight from neuron i (source) to node jth gradient descent algorithm, in which the network weights are moved along the negative of the Selection of Initial Weights gradient of performance function. This rule is Before starting ANN training, based on the simple idea of continuously initialization of ANN weights and the bias (free modifying the strengths of the input connections parameters) are required. A good choice for the to reduce the difference (δ) between the desired initial values of the synaptic weights and bias of output value and the actual output of a the network can be very helpful in obtaining fast processing element. This rule changes the convergence of training process. If no proper synaptic weights in the way that minimizes the information is available, then all free parameters mean square error of the network. This rule is of the network are set to random numbers that also referred to as the Least Mean Square (LMS) are uniformly distributed at a small range of learning rule. values". When the weights associated with a The back - propagation algorithm can neuron grow sufficiently large, neuron operates be summarized into following seven steps'": (i) in the region within which tile activation Initialize network weights; (ii) Present First input function approaches limits (in sigmoid function I vector, from training data, to obtain an output; or 0). With this, derivative of activation function (iii) Propagate input vector through network to' (DAF) will be extremely small. When DAF obtain an output; (iv) approaches zero, the weight adjustment (made Calculate an error signal by comparing actual through back-propagation) also approaches zero, output to the desired (target) output; (v) which results in ineffective training. Propagate error signal back through the network; Normalization of the Training Data Set (vi) Adjust weights lo minimize overall error; In many ANN softwares, normalization and (vii) Repeat step (ii) to (vii) with next (rescaling input data between 0 and 1) of training input vector, until overall error is data set is required before presenting it to the satisfactorily .small. network for its learning, so that it satisfies the. There are two ways of pattern Activation function range" Normalization is also presentation and weight adjustment of the necessary if there is a wide difference, between network: (i) One way involves propagating the the ranges of input values. Normalization error back and adjusting weight after each enhances learning speed of network and avoids training pattern is presented (single pattern possibility of early network saturation. training); and (ii) Another way is epoch Limitations of ANN Technique training. One full presentation of all patterns in Some of the limitations of the ANN the . training set is termed as an epoch. En-or is techniques are as follows long Training Times, back - propagated based on total network error. Large Amount of Training Data, No Guarantee Relative merits of each type of training have of Optimal Results, No Guarantee for 100 % been described". In practice, hundreds of Reliability: Although, this is true for any training iterations are required before the computational applications, this is particularly network error reaches to a desired level true for ANNs with limited training data. Good Set of Input Variable. Selection of meteorological data were used lo predict NO2 input variables that give the proper input – concentrations with reasonable accuracy. output mapping is often difficult. It is not Yi & Prybutok and Comrie described an always easy to determine the input variables or MLP model 'that predicted surface ozone form of those variables, which give the best concentrations. Results from MLP were better results. Some trial and error is required in than those from regression analysis by using selecting the input variables. same input data. Several other reported that Application of ANN in Vehicular Emission ANN technique hadbeen employed to predict Modeling surface level ozone concentrations as a Recently, ANN is increasingly used as function of meteorological parameters and an alternative tool for modeling pollutants various air quality parameters: The development from vehicular traffic, particularly in urban of ANN based model to predict ozone areas. Using inputs, neural network model concentrations are based on the realization that develops its own internal model and their prediction from detailed atmospheric subsequently predicts the output'". diffusion models is difficult because the MLP' structure of neural networks seems to lie meteorological variables and photochemical most suitable for application in atmospheric reactions involved in the ozone formation are sciences particularly for predicting VEEs. very complex. In contrast, neural networks are Moseholm et al employed MLP to estimate CO useful for ozone modeling because of their concentration levels at an urban intersection. ability to be trained using historical data and Chelani et al used a three layered neural their capability to for modeling highly non – networks to predict 502 concentrations in Delhi. linear relationship. The results indicated that the neural networks Performance Evaluation of ANN Based were able to give better predictions than Vchicular Emission Models multivariate regression models. Gardner & The evaluation of performance of Dorling developed MLF model for forecasting vehicular emission model is a matter of great hourly Nox and NO2; concentrations in London interest and it becomes particularly important in city. Perez et al showed a three-layer neural all those fields in which air quality modeling is network, a useful tool lo predict PM2.5 used as a decision making tool. Various concentrations in the atmosphere of downtown regulatory and Government agencies Santiago (Chile) several hours in advance when increasingly, but not-exclusively, rely on hourly concentrations of the previous day are these vehicular emission models to formulate used as input. In the follow-up study, Perez & effective air pollution management strategies. Trier employed MLP to estimate NO and Efforts have been made to calibrate and evaluate NO2 concentrations near a street with heavy lire performance of these models, so that they traffic in Santiago Chile. Predicted NO can represent the actual field conditions and concentrations in conjunction with forecasted results obtained from them arc accurate and realistic. Researchers have used different to which model predictions are error free. it techniques to evaluate the performance of these varies between 0 and 1, a computed value of 1 air pollution models, sometimes leading to indicates perfect agreement between the different results and interpretation, creating observed and predicted values, while a value of 0 doubts not only about applicability and reliability denotes complete disagreement. of these models but also about the performance RMSE has been further divided into evaluation techniques. A thorough discussion two components - (i) Systematic (RMSEs), also and universal guidelines on the. applicability of known as the model - oriented error, and (ii) various air quality models have long been felt. Unsystematic (RMSKu), also known as the data - However, unfortunately, standard evaluation oriented error. The RMSEs is based on the procedure as well as performance standards difference between expected predictions and accented universally, still do not exist. actual observations, while, RMSEu is based on Statistical Analysis (lie difference between actual and expected Fox suggested various statistical predictions. (residual as well as correlation) parameters for Conclusions evaluating the performance of air quality In recent years, feed - forward ANN trained with models. These included mean bias error (MBE), the back - propagation have become a popular time correlation, cross correlation coefficients and useful tool for modeling various etc. All these parameters can be found out from environmental systems, including its application observed concentrations (0i) and predicted in the area of air pollution. ANN suitability for values (Op) argued 'that the modeling complex system has resulted in their commonly used correlation measures such as r popularity and application in an ever increasing and r2 and tests of statistical significance, as number of areas'. However, care should be taken suggested by Fox, in general are often that ANN performs well in cases of interpolation inappropriate or misleading when used to whereas their reliability and accuracy is highly compare model predicted (P) and observed (O) questionable, if they are used for extrapolation variables Willmott further suggested the use of purpose. A careful interpretation of the results Index of agreement (d) and root mean square may also give an idea of the relative importance error (RMSE), instead of using statistical of various input variables, which may lead to parameters recommended by Fox. The d is a better understanding of the problem, if used in descriptive statistics, which reflects the degree to conjunction with other modeling techniques. which the observed values are accurately predicted by the predicted values. Further, d is not a measure of correlation or association in the formal sense, but rather a measure of the degree