System Identification: Arun K. Tangirala

Module 1
References
Lecture 2
System Identication
Arun K. Tangirala
Department of Chemical Engineering IIT Madras
July 27, 2013
Module 1 Lecture 2
Arun K. Tangirala System Identication July 27, 2013 22
Module 1
References
Lecture 2
Contents of Lecture 2
The following are the topics of interest in this lecture: Quick introduction to models Step-wise procedure for identication Guidelines for obtaining good models
Arun K. Tangirala
System Identication
July 27, 2013
23
Module 1
References
Lecture 2
Two classes of models

Models can be broadly classied into two categories, namely, qualitative and quantitative models. Qualitative models describe only on a qualitative basis, whereas Quantitative models describe on a numerical (mathematical /statistical) basis
Examples
Qualitative Increase in coolant ow rate reduces temperature Strain is directly proportional to stress Increase in fuel ow to an engine increases the speed Quantitative (i) y (t) = bu(t) + c (ii) y [k] = a1 y [k 1]+ a2 y [k 2]+ bu[k 2] (iii) y (t) = Aebu(tD) (iv) dy (t) + a1 y (t) = bu(t) dt
Arun K. Tangirala
System Identication
July 27, 2013
24
Module 1
References
Lecture 2
Approaches to modelling
The focus of this course and the general identication literature is on the development of quantitative models. As we learnt in the previous lecture, there are basically two approaches to modelling - (i) from fundamentals (rst-principles models) and (ii) from experiments (empirical models)
Which one is better? There is no correct answer. The choice is governed by the availability of process knowledge, end-use of model and a few other factors
Arun K. Tangirala
System Identication
July 27, 2013
25
Module 1
References
Lecture 2
First-principles vs. Empirical Models

First-principles Causal, continuous, non-linear dierential-algebraic equations Model structures are quite rigid - the structure is decided by the equations describing fundamental laws. Can be quite challenging and timeconsuming to develop Require good numerical ODE and algebraic solvers. Very eective and reliable models can be used for wide range of operating conditions Transparent and can be easily related to physical parameters/characteristics of the process
Arun K. Tangirala
Empirical Models are usually black-box and discrete-time Model structures are extremely exible - implies they have to be assumed/known a priori. Relatively much easier and are less time-consuming to develop Require good estimators (plenty of them available) Model quality is strongly dependent on data quality - usually have poor extrapolation capabilities Dicult to relate to physical parameters of the process (black-box) and the underlying phenomena
July 27, 2013 26
System Identication
Module 1
References
Lecture 2
System Identication
For several applications in practice, empirical models are very useful due to lack of sucient process knowledge and the exibility in model structure. System Identication The subject of system identication is concerned with development of (black-box) models from experimental data, with scope for incorporating any a priori process knowledge
Given the complexity of processes and that industries routinely collect large amounts of data, it is sensible to build data-based models.
Empirical approach is also favoured in disturbance modelling which involves building time-series models
First-principles models are very useful for o-line applications (e.g., simulations).
In fact, simplied / reduced-order versions are used in on-line applications. First principles models also contain some empirical correlations
Module 1
References
Lecture 2
Measurement contains mixed eects

Recall
Disturbances
Measurable
Input Signal
Output
Actuators
Process
Sensors
Measured
Responses & Disturbances
Sensor Noise
System identified by the user
Variations in output measurement consists of more than what variations in input can explain. The additional eects are due to disturbances and sensor noise.
Arun K. Tangirala
System Identication
July 27, 2013
28
Module 1
References
Lecture 2
Overall Model
The overall model developed through identication is a composite model. The deterministic model is driven by a physical input while the stochastic portion of the model is driven by a shock wave (ctitious and unpredictable)
Shock wave (ctitious, random)
Time-series modelling concepts are used to build these models
Contains eects of noise, unmeasured disturbances, etc.
Stochastic
Physical inputs (Exogenous)
Deterministic
+ +
Process response (Observed)
Contains the physics or explicable part of the process
A good identication exercise separates the deterministic and stochastic parts with reasonable accuracy. One should not be contained in the other.
Module 1
References
Lecture 2
Systematic Procedure for Identication

The exercise of identication involves a systematic and iterative procedure starting from acquisition of good data to a careful validation of the identied model.
One can nd ve basic steps in this exercise: S1 Collect the data set S2 Choose an appropriate model structure S3 Employ a criterion t and estimate the model parameters S4 Assess the quality of model (compare with known responses, errors in parameter estimates, etc.) S5 If the model is unsatisfactory, go back to one or more of the previous steps It is not possible, in general, to arrive at a satisfactory model in one iteration. A formal study of the subject equips the user with the knowledge to diagnose the shortcomings of the estimated model and to make appropriate renements.
Module 1
References
Lecture 2
Identication Workow
Primarily one nds three stages, which again contain substages All models should be subjected to a critical validation test Final model quality largely depends on the quality of data Generating rich (informative) data is therefore essential
Prior Process Knowledge Data Generation and Acquisition
Disturbances Measurable
Inputs
Actuators
Process
Sensors
Data
Visualization Pre-processing
Optimization Criteria
Select Candidate Models
Non-parametric analysis
Model Estimation
Model Development Residual Analysis Estimation Error Analysis Cross-Validation ...
Model Quality Assessment
Satisfactory?
No
Yes
Accept it
Arun K. Tangirala
System Identication
July 27, 2013
31
Module 1
References
Lecture 2
Points to remember
A minimal amount of process knowledge is incorporated at every stage of identication. When this knowledge directly governs the choice of model structure, we obtain grey-box models. Regardless of the modelling approach, certain facts may always be kept in mind: No model is accurate and complete Do not attempt to explain absolutely unpredictable uncertainties with a mathematical model Acceptance of models should be based on usefulness rather than the truth of the model
Arun K. Tangirala
System Identication
July 27, 2013
32
Module 1
References
Lecture 2
Data Acquisition
Data is food for identication. The quality of nal model depends on quality of data; hence great care must be taken to generate data. The nature of input, sampling rate and sample size are the primary inuencing factors. A vast theory exists on the design of experiments, particularly the input design, i.e., what kind of excitation is best for a given process. The nature of input is tied to the end-use of the model - whether it is eventually used for control, fault detection or in simulation. The theory of input design also allows us to obtain a preliminary idea of the model complexity that can be built for a given data set. The basic tenet is that the excitation in the input should be such that its eect in the measured output is larger than those caused by sensor noise / unmeasured disturbances.
Arun K. Tangirala
System Identication
July 27, 2013
33
Module 1
References
Lecture 2
Data pre-processing
Often the data has to be subjected to quality checks and a pre-processing step before presenting it to the model estimation algorithm. Some of the common factors that aect data quality include outliers, missing data and high-levels of noise. Outliers are data which do not conform to other parts of the data largely due to sensor malfunctions and/or abrupt and brief process excursions. Detecting and handling outliers in data can be very complicated and challenging primarily due to the fact that there are no strict mathematical denitions of outliers and they vary from process to process. A few reasonably good statistical methods have emerged over the last few decades in this context. The subject continues to emerge in search of robust universal methods. The issue of missing data is prevalent in several applications. Intermittent sensor malfunctioning, power disruptions, non-uniform sampling, data transfer losses are some of the common reasons for missing data. Several methods have been devised to handle this issue in data analysis and identication.
Module 1
References
Lecture 2
Data Preprocessing: Drifts and Trends
Pre-processing of data may also be motivated by the assumptions, limitations and requirements of model development. Most of the methods assume stationarity of data, a condition requiring the statistical properties of data to remain invariant with time. When process responses exhibit drifts and trends, they have to be brought into the framework of stationary signals. Non-stationarities are usually handled by either rst tting a polynomial trend to the output data and modelling the residuals or dierencing the data by a suitable degree and modelling the dierenced data. The latter method is best suited for processes that exhibit integrating-type (random walk) behaviour.
Arun K. Tangirala
System Identication
July 27, 2013
35
Module 1
References
Lecture 2
Data Preprocessing: Pre-ltering
Pre-ltering data is an elegant way of encompassing methods of handling drifts and trends. The idea of pre-ltering has a broader application. Often it may be required that the performance of the model is preferentially more accurate in select frequency ranges (in control, for instance). On the other hand, theoretically, several model structures can be shown to be equivalent by expressing one model structure as the development of another model structure on pre-ltered data. The framework of pre-ltering is therefore a powerful framework for data pre-processing. In passing, it may be mentioned that data pre-processing can consume the majority of the identication exercise both in terms of the time and eort.
Arun K. Tangirala
System Identication
July 27, 2013
36
Module 1
References
Lecture 2
Visualization
Visualizing data is a key step in information extraction and signal analysis. The value of information obtained from visual inspection of data at each stage of identication is immense. Prior to the pre-processing stage, visual examination assists in manually identifying presence of drifts, outliers and other peculiarities. It also provides an opportunity for the user to qualitatively verify the quality of data from an identication viewpoint (for e.g., sucient excitation). A careful examination can also provide preliminary information on the delay, dynamics and gain of the process. The information obtained at this stage can be used at the model quality assessment stage.
Arun K. Tangirala
System Identication
July 27, 2013
37
Module 1
References
Lecture 2
Data Visualization
contd.
Powerful methods exist for visualizing multi-dimensional or multivariable data. The user can exploit the eectiveness of these methods in selecting the right candidate models. Post model development, a visual comparison of the predictions vs. the observed values should be strongly preferred to a single index such as correlation or similarity factor for assessing the quality of the model. Finally, visualization of data in a transform domain (e.g., Fourier domain) can prove very benecial. Examination of the input and output in frequency domain can throw light on the spectral content and presence of periodicities. Time-frequency analysis tools such as wavelet transforms can further provide valuable information on the non-stationarity behaviour of the process.
Arun K. Tangirala
System Identication
July 27, 2013
38
Module 1
References
Lecture 2
Choice of candidate models
The development of a model from data involves two steps: i. Choosing a model structure and order ii. Estimating the parameters of that model by solving the associated optimization problem. For a given data set and estimation algorithm, the type, structure and the order of the model completely determine its predictability. The challenge in identication is usually experienced at this stage. There is usually more than one model that can explain the data. However, a careful adherence to certain guidelines enables the user to discover a good working model.
Arun K. Tangirala
System Identication
July 27, 2013
39
Module 1
References
Lecture 2
Two important facts
In choosing a candidate model, specically one should take cognizance of two facts: A correct model is beyond the reach of any modelling exercise. So to speak, no process can be accurately described by mathematics alone. The exibility in empirical modelling equips the user with tremendous freedom, but is also marked by risks of overtting and unsuitability with respect to its end-use.
Arun K. Tangirala
System Identication
July 27, 2013
40
Module 1
References
Lecture 2
Guidelines for selecting candidate models

While there are no rules that can be framed to select a candidate model, practical guidelines can be followed to arrive at a suitable model: Careful scrutiny of the data to obtain preliminary insights into the inputoutput delay, type of model (e.g., linear/non-linear), order (e.g., rst/secondorder), etc. Estimation of non-parametric models, i.e., models that do assume any structure. Non-parametric models provide reliable estimates of (i) delays, (ii) step, impulse and frequency response (process bandwidth) and (iii) disturbance spectrum without any assumptions on process dynamics Choosing an estimation algorithm that is commensurate with the type and structure of the model (e.g., least squares algorithms provide unique solutions when applied to linear predictive models) The model selection step is iteratively improved by the feedback obtained from the model validation stage. In addition, one can incorporate any available process knowledge to impose a specic model structure.
Module 1
References
Lecture 2
Estimation Criteria
The estimation criterion typically has a form of minimization of a function of the prediction errors. The function is usually based on a distance metric. For example, least squares methods minimize the Euclidean distance between the predicted and observed values. Other factors such as quality of parameter estimates, number of parameters, a priori knowledge of the model structure can also be factored into the objective function. On the other hand, the objective function can also be formulated from probabilistic considerations. The popular maximum likelihood algorithm is based on this approach. Most of the existing algorithms can be considered as variants / extensions of the least squares and maximum likelihood methods.
Arun K. Tangirala
System Identication
July 27, 2013
42
Module 1
References
Lecture 2
Factors governing estimation criteria

In setting up the estimation problem, the following points should be given due consideration:
Factors
End-use of the model: The general requirement is that the one-step ahead prediction errors be minimum. In general, a large class of end-use requirements can be brought under the umbrella of minimization of a function of the prediction-error. Domain of estimation: The estimation problem can be greatly simplied by setting it up in a transformed domain. For example, the delay estimation problem is nicely formulated in the frequency-domain and known to produce better estimates than the time-domain methods. Accuracy and precision of estimators vs. complexity of estimation: Deploying a method that produces accurate and high-precision estimates may be computationally demanding (e.g., likelihood methods).The user should keep in view this trade-o when selecting an algorithm.
Module 1
References
Lecture 2
Model Validation
Model validation is an integral part of any model development exercise, be it empirical or a rst-principles approach. The model is tested for predictive ability, reliability (variability in parameter estimates) and complexity (overtting).
Quality assessment tests

Statistical analysis of prediction errors or residuals: Correlation is the primary tool used here. A good model should produce residuals that are uncorrelated with the input as well as its own past. Error analysis of parameter estimates: The errors should be small relative to the estimated values. Overtting is indicated by large errors in a subset of parameter estimates when the model with fewer parameters can explain the data equally well. This should not be confused with loss of identiability. Cross-validation: A model should have both good interpolation and extrapolation capabilities with stronger emphasis on the former. For this purpose, a test data set consisting of a mix of features not contained in training set and features similar to those in the training data set, is used.
Module 1
References
Lecture 2
Model Renement
The outcome of the diagnostic tests constitutes the feedback for the previous stages. When a model does not meet any of the aforementioned requirements, it immediately calls for improvements at one or more of the previous steps. The developed model can be unsatisfactory for several reasons: Poor choice of criterion. Numerical procedure was not able to nd the best model despite a good choice of criterion. Model structure is not sucient to explain the variations in the data set. It may be necessary to choose from a dierent family of models. Data supplied to the estimation procedure is not informative enough. Conduct new experiments or use pre-processing techniques to alleviate noise. The user should carefully interpret the results of the diagnostic tests and associate the failure of the model to one or more of the above reasons.
Module 1
References
Lecture 2
Guidelines for obtaining good models

The foregoing discussion can be summarized into a set of guidelines / requirements for obtaining good models: Good quality data (input and experimental designs) Physical insight into the process and an appropriate understanding of a model structure. Simple models can be good approximations of complicated systems! A proper choice of time-scale for modelling (resampling the data)
Fast sampling of slow processes can result in unstable models
Right interpretation of model validation and model quality assessment

The model diagnostic checks reveal considerable information on the model sufciency and can contain clues to directions for improvement
No substitute for thinking, insight and intuition!
Arun K. Tangirala
System Identication
July 27, 2013
46

System Identification: Arun K. Tangirala

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

System Identification: Arun K. Tangirala

Transféré par

Droits d'auteur :

Formats disponibles

Module 1

July 27, 2013

July 27, 2013

Two classes of models

July 27, 2013

July 27, 2013

First-principles vs. Empirical Models

Measurement contains mixed eects

System identified by the user

July 27, 2013

Time-series modelling concepts are used to build these models

Contains eects of noise, unmeasured disturbances, etc.

Physical inputs (Exogenous)

Process response (Observed)

Contains the physics or explicable part of the process

Systematic Procedure for Identication

Select Candidate Models

Model Quality Assessment

July 27, 2013

July 27, 2013

July 27, 2013

Data Preprocessing: Drifts and Trends

July 27, 2013

Data Preprocessing: Pre-ltering

July 27, 2013

July 27, 2013

July 27, 2013

Choice of candidate models

July 27, 2013

Two important facts

July 27, 2013

Guidelines for selecting candidate models

July 27, 2013

Factors governing estimation criteria

Quality assessment tests

Guidelines for obtaining good models

Right interpretation of model validation and model quality assessment

No substitute for thinking, insight and intuition!

July 27, 2013

Vous aimerez peut-être aussi