Vous êtes sur la page 1sur 8

Steel slab temperature modelling using neural and Bayesian networks

Perttu Laurinen1, Juha Rning1 and Harri Tuomela2


Machine Vision and Intelligent Systems Group, Computer Engineering Laboratory, PO BOX 4500, FIN-90014 University of Oulu, Finland. 2 Rautaruukki Steel, Hot Strip Mill, PO BOX 93, 92101Raahe, Finland. e-mail: perttu@ee.oulu.fi, jjr@ee.oulu.fi, harri.tuomela@rautaruukki.fi

The walking beam furnace and roughing mill of Rautaruukki Steel Raahe hot strip mill were studied. The interactions between variables were modelled using Bayesian networks, and an alternative model of physical control is presented. The model uses neural networks and predicts the post roughing mill temperature of a steel slab while the slab is still in the furnace. This prediction can be used as feedback to adjust the furnace parameters in such a way that the pre-set temperature of the slab can be attained more reliably. The results obtained with neural networks are so good that they can be used to make the existing physical model more adaptable. Bayesian networks helped to clarify the phenomena involved and to code expert knowledge into an easily understandable format. The data were collected in the autumn of 1999 and consisted of more than 250,000 observations. Keywords: Neural networks, Bayesian networks, steel industry, re-heating furnace, adaptivity.

The roughing mill is the first mill through which the heated slabs pass. For rolling to be successful, the slab temperature should meet the pre-defined temperature, so that the parameters of the roughing mill can be adjusted to optimal settings. Moreover, it is even more important for the whole rolling process and the quality of the end product that slabs with varying metallurgical properties are heated accurately to the temperature suitable for rolling. However, this is not always the case. The main reasons for failure in this respect are the complexity of the process and the deficiencies in the existing physical model. The plant also produces customized steel in varying batches, which means that the slabs in the furnace are not necessarily similar in their metallurgical qualities, which makes the heating process even more difficult. The existing physical model was designed by Stein Heurtey. The deficiency of the model is that it is not flexible and does not predict slab temperatures adaptively enough in the changing environment of the furnace. In this paper, an alternative approach using Bayesian and neural networks is presented. The Bayesian network model was used to study the interactions between the variables, while neural networks were used to adaptively predict the post roughing mill temperature. Neural network models in process control have often turned out to be better than statistical, empirical or other traditional models [1].

Steel strips are produced in a hot strip mill. Before passing through the mills, the steel slabs have to be heated up to a pre-defined temperature suitable for rolling. Figure 1 shows the first part of a hot strip mill. The heating process takes place in a heating furnace, and the two most common furnace types are a pusher type furnace and a walking beam furnace. The pusher type furnace represents older technology, while the walking beam furnace is more modern. Our research focused on the walking beam furnace. The high operating temperatures of the furnaces make it difficult to collect measurement data, and it is downright impossible to obtain information about some quantities. One of such quantities is the core temperature of a steel slab, which can be measured only after the slab has passed through the roughing mill.

Figure 1: From the furnace to the roughing mill. The figure also presents the different sources of the information recorded during the heating process.

The use of the proposed models is based on the post roughing mill temperature measurement from the prerolled strip. It would be useful to already have an estimate of this temperature when the slab is in the furnace. A reliable estimate would serve as feedback to allow adjustment of the furnace parameters so that the temperature of the slab meets more closely the predefined temperature. The different steps of model development are presented in Figure 2. First, the data were collected and preprocessed, including the selection and classification of variables, which are shown separately in the Figure. After this, the flow branches. The upper part includes the steps for generating the neural network and the lower part the steps for generating the Bayesian network.

Data description and pre-processing

In general, the pre-processing of data is divided into two subtasks: 1) feature or variable transformations and 2) subset selection [3]. Feature transformation consists of tasks like forming new variables from the existing ones and combining data from different sources. Subset selection consists of selecting the variables to be analysed and the data set to be used. The method for selecting variables depends on factors like the investigators preliminary knowledge about the topic, the amount of data and the number of variables and the ratio of the amount of data to the number of variables [4]. The amount of data recorded from the heating process is large. The data of interest for this research are recorded in four database tables. The first table contains initial information about the slabs, including dimensions and material percentages, and the second table contains information about the slabs as they pass through the furnace. The third table has time series information about all the different zones of the furnace, and the fourth table contains time series information about the whole furnace. The sources of these four tables are also marked in Figure 1. The preliminary method of variable selection in this case was to use expert information. In the later phases of neural network modelling, the set of variables was reduced even more. After a careful selection of the most important variables from the four tables, there were still about 50 variables left of the original 200. After this, the data had to be combined into a single table. The data were combined into two different formats for the different modelling methods, which was a time-consuming operation from the viewpoint of computerization and planning. The tables contained hundreds of thousands of observations, and they had to be searched for the right ones to be combined with the observations in the other tables. Figure 3 shows how the different tables were combined into a time series format. The time-series format includes observations about each slab as the slab passes through the furnace. Table 2 presents the core of the combined data. From table 1, the matching records could be chosen based on identical slab ID codes. From table 3, the observations with matching zone and time stamp values were combined, and from table 4, only the time stamps were used. The slabs are presented in a temporal order, i.e. the information about the first slab passing through the furnace appears first, followed by information about the second slab, and so on.

Choosing the network structure

Going through the results

Data collection & pre-processing

Choosing the essential variables

Classifying the variables

Forming the model

Fixing the model with an expert

Figure 2: Generation of models. Our extensive review of the existing applications showed that no such system existed so far. However, there are quite a few neural network applications in the field of steel production, covering the whole production process. The most similar research has been done on slab temperature prediction [2]. These approaches aim to generate a neural network model from the information obtained from thermometers installed inside special slabs, which are run through the furnace. The use of these thermometers is, however, so expensive that not many slabs can be run through the furnace and excessive observations are not economically feasible. It may also be a false assumption that the data gathered with these few slabs are representative. Moreover, the approach cannot be perfectly adaptable, because the special slabs cannot be run through the furnace continuously. The models presented in this paper are based on data gathered on-line and available all the time.

Combination of the data into a slabwise format was slightly more complicated. This set of combined data contains only one observation about each slab. The mean was used in cases where the data contained several observations about one slab. Figure 4 shows how the data were combined in this case, and table 1 now presents the core of the combined data. The slabwise data had to be classified after the combination, because the modelling tool accepted only class variables. The classification was done in co-operation with an expert, and basic statistics and histograms of the data were used. An example of a classified variable is the post roughing mill temperature. The combined data consisted of 254,229 observations for the time series and 14,666 observations for the slabwise data. The data were gathered between 29.9.1999 and 2.11.1999.
Timestamp for combining Timestamp and displacement for combining

The two methods used in the analysis were Bayesian and neural networks. These methods are mutually complementary. With Bayesian networks, it is possible to present the interactions between variables and gain a better understanding of the process itself. It is also possible to study the conditional probabilities between variables, but more weight was here placed on understanding the complex structure of the process. Neural networks are a non-parametric method that allows very flexible modelling of the response variables. With neural networks, however, it is hard to gain a better understanding of the observed phenomena. Neural networks were used to predict the post roughing mill temperature. A Bayesian network consists of nodes (variables) and directed connections between them. More specifically, they are a subclass of graphs with no loops and only directed connections. An example of a Bayesian network is shown in Figure 5. A node which is the origin of an arrow pointing to another node is called a parent and the destination node of the arrow is called a child. The child node has a probability distribution conditional on that of its parents, i.e. the probability of the child being in a certain state varies according to the state of its parents. One of the computational benefits of Bayesian networks is the fact that a node is conditionally independent of all the other nodes, given the state of its parents. In other words, if a vector of data y k = ( y1k , y 2 k ,..., y Ik ) , where I denotes the number of nodes in the network structure, is observed, the joint probability of the event is:
py I ( k ) = p(yik | parents of yik in the observed state) . i =1

ID number for combining

Time series from slabs

Time series from zones

Basic information from slabs

Records with which other data is combined

Zone parameters by timestamp and displacement

Time series from furnace

Basic parameters for each record by ID number

Furnace parameters by similar timestamps

Time Series Data

Figure 3: Combining the data into a time series format. The boxes labelled 1, 2, 3 and 4 stand for the different tables in the database.

Timestamps for combining Timestamps for combining Displacement for combining ID number for combining

Time series from slabs

Time series from zones

This results in a large reduction of calculations, since the probabilities have to be calculated only conditional on the parents of each node, not conditional on the whole set of variables [5]. When dealing with large groups of variables, which number hundreds, this property becomes a crucial point [6].
Time series from furnace

Basic information from slabs

Weighted means by ID number

Weigted means of zone parameters by time intervals

Records with which other data is combined

Weighted means of furnace parameters by time interval

Slab Data

It is to be noted about the network connections that they do not necessarily represent causal relationships. Some authors are strongly in favour of the causal interpretation [7], while others are more cautious [6]. If the network has been formed based solely on data, it is inappropriate to interpret the connections as being causal [8]. The Bayesian network structure can be generated, based on expert information, with an automatic search

Figure 4: Combination of the data into a slab format.

algorithm or by the combination of these two techniques. The search algorithms compare different structures based on measures for the goodness of fit. These measures include, for example, the Bayesian quality measure, the minimum description length measure, which dates back to the coding theory, and information measures [9]. Different structures can also be assigned different preliminary probabilities or uniform distributions if no prior information is available. It is also common to include a penalty factor for excessively complex structures in models of this kind. One such penalty term is the well known Dim(B) log n, where Dim(B) is the number of parameters needed to describe the joint probability distribution of network structure B and n is the number of observations. For more information on Bayesian networks, [10] and [5] are recommended. Neural networks are probably a slightly better known method than Bayesian networks. They are a data-driven method, and in this case the multi-layer perceptron model was used. When neural networks are used in practice, the data are usually divided into two sets: a training set and a test set. The training set is used to adjust the parameters of the network, and the independent test set is used to find out how well the network has learnt the data [11]. In applications demanding an adaptive approach, this kind of training could lead to major estimation errors, as the environment keeps changing. In time series modelling, there is usually no test set that could be used, but the future observations themselves constitute the test set. This means that the network is trained with the data at hand, a prediction for future observation(s) is given, and when the future becomes present, the error is calculated.

with multiple parents was far too complicated to be understood and further developed by an expert. Also, the Bayesian networks applied in practice do not usually have many connections originating from individual variables [13]. After this preliminary model had been generated, it was presented to an expert, who expressed his opinions about the connections, indicating which were correct, which were wrong and which were missing. The modified model is presented in Figure 5. It turned out to be quite different from the preliminary model, although the preliminary model was a good indicator of what the ultimate model should look like. In the preliminary model, the material percentages and heating code were nicely grouped together, and that part of the model was left unchanged. Another part that the program got right was the dimensions of the slabs, but it did not do so well with the gases and temperatures, which were notably altered. In Figure 5, three different connections can be found. There are connections between the groupings of variables, between the individual variables and between the groupings of variables and individual variables. The connections should be primarily interpreted as interactions between variables, not as causal relationships. The Bayesian Knowledge Discoverer did not support groupings of variables, but they were added into the graph to clarify the model and to bring in extra information. The variables make up eight bigger groups. An example of a connection between the groups is the connection between the slab dimensions and the gas flows. This shows how slabs with different dimensions need different amounts of gases to heat up to the predefined temperature. The only connection between a variable and a grouping of variables is that between the temperature of outcoming air and the gas temperatures, which is due to the fact that when the gases in the furnace are warmer, the outcoming air is also warmer. Based on the interactions between variables, the conditional probability distributions can be studied. Table 1 shows the conditional probabilities between the post roughing mill temperature and the roof temperatures of the left and right sides of the furnace. Because both of the roof temperature variables have four classes and the variable itself has five classes, the size of the matrix of conditional probabilities is 16x5. In table 1, only the most interesting rows of the matrix are listed. If the temperature of the right side of the roof was lower than 1230C and the temperature of the left side of the roof was 1230-1255C, the slab temperature was almost certainly within 1100-1120 C. If both roof

Application Results
Bayesian networks Bayesian networks are a relatively new method, and the tools available are being actively developed. The preliminary Bayesian network model was created with the Bayesian Knowledge Discoverer [12]. At the time of analysis, it was found to be the most suitable software package for generating the network, especially since its ability to automatically discover the structure between variables. The preliminary network model was generated automatically using search algorithms. The number of parents that a node could have was limited to one. This decision was made after a comparison of network structures with different numbers of parents. A structure

Gas flows on zones

oxygen level / bot. zone liquid gas flow / top zone air flow / top zone oxygen level / top zone liquid gas flow / bot. zone Lateral deflection

Slab dimensions
Weight Initial tmp. Sulfur Furnace pressure

air flow / bottom zone




Incoming air tmp.


Gas temperatures

Air tmp / top

Roof tmp. 2






Air tmp / bottom

Post rough. mill temp.

Roof tmp 1






Recup. out gas temps Combustion gases oxyg. level

Recuperat. intake gas temps Liquid gas flow

Outgoing air temp Outgoing gas temp

Floor tmp. 1

Floor tmp. 2

Heating code

Carbon / Mangan


Dissolved aluminum



Material percentages
Left roof phys. model Right roof phys. model Right waste Left waste

Temperatures & gases

Phys. model roof temps

Combustion gases

Figure 5: The final structure of the Bayesian network.

Post roughing mill temperature Roof tmp 1 Roof tmp 2 (right side) (left side) <1230 1230-1255 <1230 <1230 >1280 >1280 >1280 1255-1280 <1230 >1280 <1100 1100-1120 1120-1140 1140-1160 >1160

0.006 0.158 0.001 0.002 0.200

0.976 0.735 0.001 0.002 0.200

0.006 0.106 0.001 0.002 0.200

0.006 0.001 0.830 0.746 0.200

0.006 0.001 0.167 0.250 0.200

Table 1: Conditional probabilities of the post roughing mill temperature. temperatures were under 1230oC, the slab temperature was most probably in the lower temperature range, and especially the number of slabs with temperature under 1100oC was largest within this roof temperature range. In the case where the temperatures were other than under 1230oC, the maximum probability of getting a slab under 1100oC in temperature was 0.036. On the other hand, when the roof temperatures were over 1280oC, the slabs also had bigger probabilities to be warmer. The probability of such slab to be under 1140oC was close to zero. It can also be seen from the table that the slabs belonged to the highest temperature class most probably if the temperature of the right roof was higher than 1280C and that of the left roof was within 1255-1280C. The last row of the table shows what happens when there are no observations in the data that would meet the classes of the parent variables the classes are given uniform probabilities. Neural networks The neural network model was based on time series data from the soaking zones. The soaking zones were selected to be the only zones included in the model, since it was found out, after some experimentation, that the information from the previous zones does not allow us to predict the post roughing mill temperature. The models were implemented with Matlab 5.3 and its Neural Network Toolbox. The modelling was done with a SUN workstation with four 450 Mhz processors and three gigabytes of memory. However, only one of the processors was used at a time. The data were divided into smaller portions to facilitate the computations. The processing of each of these portions took 10 to 15 minutes, including the time needed to load and preprocess the data.

The variables included in the neural network model are the same as those in the Bayesian network, except that the material percentages and the three unconnected variables in the Bayesian network were left out, since it appeared that they had only a minor effect on the prediction. The neural network input variables were grouped into four vectors similarly to the grouping in the Bayesian network. The network structure used was otherwise a standard feedforward neural network [11, p.116] with 12 tanh [11, p.127] activation functions. The approach is adaptive, because the environment in the furnace is not constant. This sets some extra requirements for the model. The data that the network predicts are time series data from the viewpoint of the roughing mill. From each slab is measured a certain amount of data, which are ordered chronologically, and the slabs are also ordered chronologically. Figure 7 clarifies this explanation. Each horizontal line represents a slab and the length of the line represents the amount of data about that slab. The neural network was trained with previous measurements of the process and pre-rolled strip temperatures to predict a future temperature. This was implemented with a moving time window including a certain number of slabs. The size of the window was set to approximately 150 data points after experimentation with different window sizes. The size was not exactly 150, because the number of measurements on each slab was not constant. The window included so many slabs that it contained a minimum of 150 data points. The predictions were smoothed using averaging. The mean for the prediction on the second data point of a slab was the mean of the first and second predictions, the mean on the third data point was the mean of the first three predictions, and so on. The mean on the last data point was the mean of all the predictions of the slab. Since the post roughing mill temperature has not been predicted in the mill so far, there is not reference material from an existing model for the predictions. Because of this, the fit of the prediction can only be estimated based on expert opinions and statistics. The error was calculated from the difference between the
Data set 29.9/ Statistic 1.10 Mean 7.89 Median Std 5.39 8.45 1.10/ 3.10 8.35 6.21 7.60 3.10/ 5.10 9.83 6.22 11.96 7.10/ 8.10 8.71 6.16 10.22

last smoothed prediction of each slab and the actual measurement. The following two tables contain statistics concerning the absolute value of the error. A prediction error smaller than 5oC can be considered extremely good, an error under 10oC is normal or tolerable, but an error of more than 15oC is too large. Tables 2 and 3 show some error statistics. As it can be seen from Table 2, the average error has been less than 10oC, and the median is even smaller, being 5.63C on an average. Table 3 shows the proportions of the predictions that were closer than 5 and 10oC to the actual value and those that differed by more than 15oC from it. On an average, 47% of the predictions were within a range of 5oC from the actual value, and 73% were within 10oC. 14% of the predictions were more than 15oC apart from the actual value. An example of a prediction based on the model is presented in Figure 6. The solid line represents the post roughing mill temperature for each slab. Each horizontal part of the solid line represents the post roughing mill temperature for an individual slab and its length represents the amount of data recorded from it on the soaking zones. The dashed line represents the smoothed prediction. As it can be seen, the prediction follows quite well the measured values. The problem is the peaks in the prediction, which occur at random. They make the prediction unreliable, even though the average is accurate. However, this model does not include confidence intervals, and their use might help to avoid the problem of the peaks. One thing to note is the reliability of the measurements. Measured information coming from sensors might sometimes be unreliable. Especially in such an extreme environment as the furnace, it is very demanding for the sensors to operate and unreliable measurements may lead to errors in prediction. Finally, Figure 7 shows an enlargement of the prediction for a set from observation 1200 to observation 1900.
1.11/ 2.11 6.77 4.89 7.29 Avg. 8.02 5.63 8.23

9.10/ 11.10/ 12.10/ 13.10/ 14.10/ 23.10/ 25.10/ 10.10 12.10 3.10 14.10 15.10 24.10 26.10 7.70 7.46 7.26 6.43 9.35 9.24 7.26 4.46 8.90 5.48 7.20 5.39 6.25 4.25 6.35 6.91 9.46 6.15 8.82 6.04 6.27

Table 2: The mean, median and standard deviation of error for the data sets.

Data set 29.9 Error 1.10 0.48 <5C <10C >15C 0.72 0.14

1.10 3.10 0.42 0.71 0.13

3.10 5.10 0.39 0.69 0.17

7.10 8.10 0.43 0.70 0.15

9.10 11.10 10.10 12.10 0.55 0.46 0.75 0.14 0.76 0.12

12.10 13.10 14.10 23.10 13.10 14.10 15.10 24.10 0.48 0.56 0.39 0.40 0.74 0.11 0.80 0.08 0.68 0.20 0.69 0.18

25.10 26.10 0.47 0.73 0.13

1.11 2.11 0.50 0.79 0.09

Avg. 0.47 0.73 0.14

Table 3: The portions of prediction error for the data sets.

Figure 6: The most erroneous prediction, data between 14.10-15.10. The vertical axis shows the post roughing mill temperature in Celsius, while the horizontal axis shows the number of recorded observations in the data set.

Figure 7: Enlargement of a set of data from Figure 6.

In this work, an alternative method for physical control in a hot strip mill was presented. The model predicts the post roughing mill temperature of a steel slab using neural networks while the slab is still in the heating furnace. The interactions between variables were studied with Bayesian networks. Data from the Rautaruukki Steel mill were collected and pre-processed. The pre-processing consisted of variable selection and combination of data from four different database tables into time series and slabwise formats. The time series data included 254,229 observations and the slabwise format 14,666.

The Bayesian network was very useful in clarifying the dynamics of the process. It represents information obtained from an expert and allows even a researcher from a different discipline to gain basic knowledge of the complex process. It also contains information about the conditional probabilities between the variables, possibly pointing out anomalous process behaviour. The disadvantage of the method was the ongoing rapid development of the tools, which are still quite far from the possibilities presented in the theory. The implementation of the neural network model required an adaptive approach. A time window containing enough slab information to cover at least 150 observations was used to predict the slab post roughing mill temperature while it was still in the furnace. This

prediction can be used as feedback for the control algorithm of the furnace to adjust the furnace parameters so that the slab temperature will be closer to the pre-set one. The average error of the model was 8.02oC and the median 5.63C. Altogether 47% of the predictions were closer than 5oC to the actual value, while 73% were closer than 10oC and 14% were further than 15oC. These results are accurate enough to allow the model to be applied on the production line. One problem was the large errors in the prediction of some slabs, as the model does not yet include any adaptive confidence intervals.

[10] Pearl J.: Causality : Models, Reasoning, and Inference, Cambridge Univ Pr (Trd), 2000. [11] Bishop, C.: Neural Networks for Pattern Recognition, Oxford University Press, 1995. [12] M. Ramoni and P. Sebastiani: Bayesian Knowledge Discoverer, a publicly available program for forming Bayesian networks, http://kmi.open.ac.uk/projects/bkd/, referenced 20.9.2000. [13] Myllymki P., Tirri H.: Baeys-verkkojen mahdollisuudet, TEKES, teknologiakatsaus, 58/98, Helsinki 1998. (In Finnish only)

[1] Sbarbaro-Hofer D., Neumerkel D., Hunt, K.: Neural Control of a Steel Rolling Mill IEEE Control Systems Magazine Volume: 13 3 , June 1993 , Page(s): 69 75, 1993. [2] Maheral, P.; Ide, K.; Gomi, T.; Pussegoda, N.; Too, J.J.M.: Artificial intelligence techniques in the hot rolling of steel, Electrical and Computer Engineering, 1995. Canadian Conference on , Volume: 1, Page(s): 507 -510, 1995. [3] Liu H. and Motoda H.: Feature Transformation and Subset Selection, IEEE Intelligent Systems, March/April, 1998. [4] Pudil P. and Novovi ov J.: Novel Methods for Subset Selection with Respect to Problem Knowledge, IEEE Intelligent Systems, March/April, 1998. [5] Ramoni M., Sebastiani P.: Bayesian Methods for Intelligent Data Analysis, KMi Technical Report KmiTR-67, July 1998. [6] Jensen F. V.: An Introduction to Bayesian Networks, UCL Press, 1996. [7] Pearl J.: The New Challenge: From a Century of Statistics to an Age of Causation, Technical Report (R249), In Computing Science and Statistics, 29(2), 415-423, 1997. [8] Heckerman D., Geiger D., Chickering D.: Learning Bayesian Networks: The Combination of Knowledge and Statistical Data, Machine Learning, 1995. [9] Castillo E., Gutirrez J.M., Hadi A.S.: Expert Systems and Probabilistic Network Models, SpringerVerlag, 1997.