Assessment of metal pollution based on multivariate statistical
modeling of hot spot sediments from the Black Sea
V. Simeonov a, * , D.L. Massart b , G. Andreev c , S. Tsakovski a a Chair of Analytical Chemistry, Faculty of Chemistry, University of Soa ``St. Kl. Okhridski'', J. Bourchier Blvd. 1, Soa 1126, Bulgaria b Vrije Universiteit Brussel, Pharmaceutical Institute, Pharmaceutical and Biomedical Analysis, 1090 Brussels, Laarbeeklaan 103, Belgium c Institute of Oceanology, Bulgarian Academy of Sciences, P.O. Box 152, Varna 9000, Bulgaria Received 25 August 1999; accepted 24 November 1999 Abstract The paper deals with application of dierent statistical methods like cluster and principal components analysis (PCA), partial least squares (PLSs) modeling. These approaches are an ecient tool in achieving better understanding about the contamination of two gulf regions in Black Sea. As objects of the study, a collection of marine sediment samples from Varna and Bourgas ``hot spots'' gulf areas are used. In the present case the use of cluster and PCA make it possible to separate three zones of the marine environment with dierent levels of pollution by interpretation of the sediment analysis (Bourgas gulf, Varna gulf and lake buer zone). Further, the extraction of four latent factors oers a specic interpretation of the possible pollution sources and separates natural from anthropogenic factors, the latter originating from contamination by chemical, oil renery and steel-work enterprises. Finally, the PLSs modeling gives a better opportunity in predicting contaminant concentration on tracer (or tracers) element as compared to the one- dimensional approach of the baseline models. The results of the study are important not only in local aspect as they allow quick response in nding solutions and decision making but also in broader sense as a useful environmetrical methodology. 2000 Elsevier Science Ltd. All rights reserved. Keywords: Multivariate statistics; Cluster analysis; Principal component analysis; Partial least squares modeling; Marine sediments; Heavy metals pollution 1. Introduction There is growing concern all over the world on the extent of sediment contamination in coastal regions (National Research Council, 1989; Environmental Pro- tection Agency, 1992). The signicance of the sediment contamination to the overall condition of the aquatic environment has been recognised since a long time. Most contaminants discharged into coastal waters rap- idly become associated with marine particulate matter and incorporated in sediments. Although diagenetic processes in the sediments modify and redistribute contaminants between solid and water phases, immobi- lization by sedimentation process dominates for most of the typical pollutants. As already indicated (Martin and Whiteld, 1983) the contaminant metal accumulation in coastal sediments provides a record of the spatial and temporal history of pollution. That is why sediment monitoring can deliver important information on vari- ous pollution events. Chemosphere 41 (2000) 14111417 * Corresponding author. E-mail address: VSimeonov@chem.uni-soa.bg (V. Sim- eonov). 0045-6535/00/$ - see front matter 2000 Elsevier Science Ltd. All rights reserved. PII: S 0 0 4 5 - 6 5 3 5 ( 9 9 ) 0 0 5 4 0 - 8 Large-scale studies in the USA are dedicated to trace element contaminants in sediments and carried out by large organizations like the National Oceanic and At- mospheric Administration (NOAA) through programs like National Status & Trends (NS&T) Program for Marine Environment (Cantillo and O'Connor, 1992; Hanson et al., 1993; Daskalakis and O'Connor, 1995). The programs make it possible to compare contaminant metal levels over large scales of distance and time. But in this case a method for separating contaminant and natural components as well as natural and anthropo- genic contaminant levels is very important. This holds especially true for evaluation of the pollution in ``hot spots'', i.e., sampling sites strongly inuenced by human activity. Usually, these are small zones of higher con- tamination due to specic anthropogenic point sources (discharge points, industrial regions). In principle, these sites are not representative of their natural surroundings if the scope of a study is to follow a general pattern of distribution of trace metal in sediments (e.g., world-wide pattern as indicated by Cantillo and O'Connor, 1992). But the ``hot spots'' become very interesting to evaluate or model a local pollution. In this case it is important even to rank the pollution level for dierent sites in a larger region in order to get information on the range of the anthropogenic inuences or to separate them from the unavoidable natural variability of the sediment content. There is a variety of approaches for reducing natural variability and improving the statistical power in data intercomparison. Metal concentrations are often nor- malized to a conservative component (very often alu- minium or iron for sediment samples) whose levels are unaected by contaminant inputs (DeGroot et al., 1976; Forstner and Wittmann, 1981; Loring, 1990; Luoma, 1990). The association between Al (or Fe) has a geo- chemical foundation which can be used as a basis for development of statistical models. A very interesting baseline regression model for variety of components on Al was introduced by Hanson et al. (1993). Based on observed covariation of elements at 15 estuaries remote from contaminant inputs, linear regressions of metals on Al were used to model the metal content in baseline sediments. A geochemical model for the covariation is developed, veried and used to guide the statistical modeling approach. Using these baseline relationships, sediment metal concentrations can be partitioned into natural and anthropogenic fractions. Hanson et al. (1993) believe that this approach works better than the more complex and more abstract approaches of the multivariate statistics like principal components analysis (PCA), principal components regression (PCR) etc. It has to be mentioned that the baseline model relies on a substantial amount of data from non-polluted coastal area which could be easily found for the huge coastal line of the USA and by the data support of programs like NS&T. Very often it is necessary to esti- mate statistically or model ``hot spots'' areas by direct sampling and analysis of sediments from a limited number of sites. Quick decisions have to be made with respect to local pollution events. In this case the multi- variate statistical approaches seem to deliver more sub- stantial information on links between sampling sites, pollutant concentrations, latent factors responsible for the data set structure and pollutant sources apportion- ing. As stated by Einax et al. (1997) and Einax and Soldt (1999), the complexity and the large variance of envi- ronmental sets limit the use of common statistical methods for the assessment of the state of pollution, so the application of geostatistical and multivariate statis- tical methods is recommendable. The Bulgarian Black sea coast is known as a recre- ation area, and this requires a constant environmetrical control of all marine phases since a large number of industrial ``hot spots'' are also located near to the shore (oil renery near to Bourgas gulf, cement and chemical plants around Varna gulf, steel-works etc.). Usually, the reaction of the authorities to monitoring data is slow due to the lack of clear information about the real state of pollution or to the temporary and local character of the pollution events. It is the aim of the present study to perform a chemometric analysis of metal concentrations in sediments collected at dierent ``hot spots'' sites of the Bulgarian Black sea coast in order to oer a more in- formative and rapid assessment of the state of pollution. 2. Experimental 2.1. Sampling and sample analysis Sediment samples were taken from four dierent sampling sites: Lake Beloslavsko (10 sites, sample number 110, close location of glass production facto- ry), Lake Varnensko (11 sites, sample number 1121, close location of steel-work), Varna Gulf, close to lake Varnensko (7 sites, sample number 2228), Varna Gulf near to coast (7 sites, sample number 2935; close lo- cation of cement and chemical plants Solvaysoda) and Bourgas Gulf, near to the waste inlets caused by the local oil-renery (4 sites, sample number 3639). It is worth noting that in the conguration of the coastal line, the two lakes mentioned (Beloslavsko and Varnensko) serve as a natural buer zone between the industrial zone and the gulf of Varna. For the gulf of Bourgas no such zone exists, and there is a direct inlet of contami- nated waters into the sea. Sediment samples were obtained with a standard SmithMcIntyre bottom grab (Hanson et al., 1993) or a box corer. Three grabs or cores were made at each si- te(depths between 8 and 50 m). Composite samples were made from sediments in the cores. Sediment analysis for 1412 V. Simeonov et al. / Chemosphere 41 (2000) 14111417 a site consisted of inorganic (metal components) analy- sis. In principle the standard procedure suggested by NOAA (Shigenaka and Lauenstein, 1988) was followed as closely as possible. The elements measured throughout this study were Cu, Pb, Mn, Zn, Co, Cd, Cr, Fe, Ni and As. Digestion in concentrated hydrouoric acid and subsequent analysis by atomic absorption were used for quantication. ETAAS (graphite furnace AAS, Perkin Elmer Z/3030) was the analytical methods to determine Cu, Pb, Co, Cd, Cr, Ni, As and ame AAS (Perkin Elmer 603) was used for Mn, Fe and Zn. Certied reference materials (MESS-1, BCSS-1 and NBS 1646) were run with each series of samples. A measured value for the reference material within the oered tolerance was the criterion for bias control and check of the data quality. Precision for Fe, Mn and Zn was 65% (as relative standard de- viation); for Cu, Cd, Cr, Co, As, Ni and Pb 610% as RSD. The experimental data for 39 sampling sites each analyzed for 10 components are presented in Table 1 as a basic statistical summary of the input set. Single data are available on request. 2.2. Statistical analysis Throughout the study various statistical approaches were used, among them regression analysis (linear and PCR), cluster analysis, PCA and partial least squares (PLSs) modeling. All of these methods are well-known and described in the literature, to mention only a small part of the huge amount of bibliography (Draper and Smith, 1981; Massart and Kaufman, 1983; Malinowski, 1991; Einax et al., 1997). Since there are dierent opin- ions on the use of multivariate statistical methods in environmental analysis, varying from sceptic accepta- tion as ``complex and to some extent formal'' (Hanson et al., 1993) to solid conviction of being ``a useful tool for the best evaluation and interpretation of environmental data'' (Einax and Soldt, 1999), it was the aim of this study to compare in a practical way these two extreme judgments. Throughout the study the sofware packages SPSS 8.0, UNSCRAMBLER 5.5 and STATISTICA 5.0 were used. The possibility to analyze multidimensional data sets without information about the spatial locations is the advantage of the multivariate statistics. We consider that the basic principles and the theoretical fundamen- tals are known to a substantial part of the readers and there is no need for a detailed explanation (Einax and Soldt, 1999). Many studies oer algorithms of dierent multivariate statistical approaches. For instance, cluster analysis is carried out to reveal specic linkage between sampling sites being an indication of similarities or dis- similarities between their trace metal contamination. Further, PCA is applied to detect the ``hidden'' structure of the data set, trying to explain the inuence of latent factors on the data distribution. The regression analysis in its classical linear performance or as more sophisti- cated form of multivariate regression by the use of rst, X-variables decomposition by PCA and then, regression of each Y-variables onto the decomposed X-matrix (Esbensen et al., 1994) makes it possible to create re- spective trend and prediction models for each contami- nant. Finally, PLSs modeling allows to nd out a more detailed connection between, e.g., natural and possible anthropogenic sources for the sediment diagenesis. It may be stated that by means of multivariate analysis that a semiquantitative assessment of the polluted area is feasible. In the present chemometric study Wards and single linkage clustering methods (Euclidean distance as simi- larity measure), Varimax-rotation for PCA on auto- scaled data and PLS algorithms for multivariate regression (Esbensen et al., 1994) were applied. 3. Results and discussion In order to compare the one-dimensional approach with the multivariate one, linear regression for the sed- iment components on iron was performed. The regres- sion models are summarized in Table 2. Iron was chosen as tracer since it is a typical natural component for the Table 1 Descriptive statistics of the input data (concentrations in mg/kg) Element Mean value Highest value Lowest value S.D. S.E. Cu 80.00 786.00 3.00 140.76 22.54 Pb 15.96 118.80 1.55 24.76 3.97 Mn 302.92 1710.00 73.67 371.85 59.54 Zn 58.97 265.54 14.77 46.98 7.52 Co 3.69 13.00 0.01 2.50 0.40 Cd 0.81 4.29 0.01 1.12 0.18 Cr 25.59 109.52 3.00 25.54 4.09 Fe 9554.12 98200.00 0.85 26005.57 4164.22 Ni 12.96 28.00 1.00 8.55 1.37 As 29.17 222.72 0.01 52.17 8.35 V. Simeonov et al. / Chemosphere 41 (2000) 14111417 1413 Black sea sediments and its use as regressor in many sediment studies is acknowledged (Simeonov and Andreev, 1989). Acceptable linear regression models are obtained for Cu, Mn and Co, e.g., for components which are present in higher natural concentrations in the sediments (Martens and Naes, 1991). Since no real baseline data are available for the region of interest, nothing specic could be concluded about the ecological situation at the ``hot spots'' at the Bulgarian Black sea coast. It seems also not very reliable to predict the anthropogenically inuenced concentrations (Pb, Cd, Ni, As) by the iron tracer. The cluster analysis results (hierarchical clustering, Wards method) of the sampling sites as objects are shown in Fig. 1. Altogether four clusters could be in- terpreted divided into two bigger subgroups: the rst contains heavily polluted sites from Varna and Bourgas gulf (near to the coastal line and the waste inlets, sites 2939 located near a big chemical and cement plant Varna and an oil renery Bourgas) and several sites from both coastal lakes located near to industrial sources (sites 2, 4, 6, 7, 8, 9 from the Lake Beloslavsko located near to a glass production factory; sites 14, 17, 19 from the Lake Varnensko located near to a steel- work); the second one indicates a moderately polluted buer zone consisting of lake and near to the lake Varna gulf sites. In both big clusters two subgroups could be found. In the rst one they represent the most severly polluted gulf areas (sites from Varna gulf 2935 and from Bourgas gulf 3639) and the less contaminated lake industrial inlets (sites 2, 4, 69 fom Lake Beloslavsko and sites 14, 17, 19 from Lake Varnensko). In the second one they reect the separation between one (Varnensko lake and non-aected Varna gulf part, sites 1113, 15, 16, 18, 20, 21 and sites 2228, respectively) or another part (Beloslavsko lake, sites 1, 3, 5, 10) of the buer zone moderately aected by pollutants. The next step in the multivariate statistical analysis was application of PCA in order to group the chemical components by the loadings plots and the sites by the scores plots. It is interesting to note that the site score plot (Fig. 2) reveals a more detailed description of the region of interest. Fig. 1. Hierarchical dendrogram (Wards method of linkage) for 39 sampling sites as objects (sites 110 Lake Beloslavsko; sites 1121 Lake Varnensko; sites 2228 Varna Gulf near to lale Varnensko; sites 2935 Varna Gulf near to coast; sites 3639 Bourgas Gulf). Table 2 Linear regression models for elements relative to Fe for coastal sediments Element n Intersept b 0 Slope b 1 S.E. of b 0 S.E. of b 1 r 2 Cu 39 36.67 0.0045 13.30 0.00049 0.71 Pb 39 16.28 )0.00003 4.29 0.00016 0.001 Mn 39 176.16 0.0013 24.00 0.00088 0.87 Zn 39 56.57 0.00025 8.06 0.00029 0.02 Co 39 3.33 0.00004 0.40 0.000015 0.87 Cd 39 0.92 )0.00001 0.19 0.0000068 0.07 Cr 39 27.82 )0.00023 4.30 0.00016 0.06 Ni 39 12.87 0.00001 1.38 0.000054 0.0001 As 39 33.14 )0.00042 8.84 0.00032 0.04 1414 V. Simeonov et al. / Chemosphere 41 (2000) 14111417 The sites in the Bourgas gulf (3639) represent an independent group (I) of heavily polluted area (oil-re- nery). They are denitely separated from all other sites and this is due to the enhanced element concentrations. The next well-formed group (III) comprises sites from the moderately contaminated lake buer zone (sites 1 28) which includes sites from the two lakes and Varna gulf sites located nea to Lake Varnensko. The third group (II) indicates the intermediate level of pollution (higher than the buer zone contamination but less than the Bourgas gulf area) of the sites originating mainly from the Varna gulf area (sites 2935). The factor loading matrix is listed in Table 3. Four factors describe almost 90% of the total vari- ance of the system. The rst one contains dominantly copper, manganese and iron and could be conditionally named ``natural'' since these elements are typical con- stituents of the Black sea marine sediments. The second factor includes zinc, cadmium and chromium, the third lead and arsenic and the fourth nickel and cobalt. The last three factors reect typical anthropogenic inuences of heavy metals from various sources such as chemical and glass producing plants, oil reneris, steel-works and smelting plants. The detected pollution patterns indi- cated in a semiquantitative way the emission sources. In Fig. 3 the 3-D loading plot (PC1 vs. PC2 vs. PC3) is presented and the relationship between the variables is readily seen. It is quite interesting in the next step of the chemo- metrical analysis to check if it is possible to oer re- gression models for each contaminant relative to one or another natural tracer component, e.g., metal f Fe for linear regression; metal f PC i for PCR and metal f Fe or f (Fe, Mn) for PLSs modeling (PLS1 and PLS2) with one and more than one regressor. Again, iron was chosen as tracer for PCR and PLS1 modeling variation (only one independent variable). In both cases an almost complete agreement between the results of linear regression and these two approaches was observed. A substantial improvement of the model signicance was achieved with the PLS2 modeling mode when two tracer elements (Mn and Fe) were chosen as independent variables in the multivariate regression. The results are listed in Table 4. The comparison with the linear regression models (Table 2) reveals a much better correlation for all an- thropogenic constituents. It means that their concen- trations could be more reliably predicted in the ``hot spot'' areas based on the naturally occurring compo- nents. The heavily contaminated Bourgas gulf site forms a group of outliers. This fact oers an opportunity to Table 3 Factor loadings (Varimax normalized; marked loadings are >0.70) for four principal components (PC) Element PC1 PC2 PC3 PC4 Cu 0.95 0.08 )0.09 0.04 Pb 0.04 )0.04 )0.88 )0.04 Mn 0.96 0.04 0.16 0.17 Zn 0.27 0.90 0.04 0.11 Co 0.31 )0.04 0.17 0.89 Cd )0.15 0.88 )0.26 )0.05 Cr )0.13 0.92 0.05 0.27 Fe 0.95 )0.14 0.07 0.11 Ni )0.01 0.48 )0.04 0.81 As )0.14 0.16 )0.86 )0.06 %Expl.var 29.54 27.07 16.52 15.84 Fig. 3. PCA loading 3-D plot (PC1 vs. PC2 vs. PC3) for 10 chemical components. Fig. 2. Bivariate PCA scores plot (PC1 vs. PC2) for 39 sam- pling sites (sites 110 Lake Beloslavsko; sites 1121 Lake Varnensko; sites 2228 Varna Gulf near to lale Varnensko; sites 2935 Varna Gulf near to coast; sites 3639 Bourgas Gulf). V. Simeonov et al. / Chemosphere 41 (2000) 14111417 1415 check if the model adequateness will be improved by eliminating the outliers. It should also be mentioned that in many observations of the sediment diagenesis the enhanced concentrations of naturally appearing com- ponents could be the reason for enhanced concentration of Cd and Pb without additional anthropogenic impact (Hanson et al., 1993). In this sense the elimination of the outliers is a good assumption for a more reliable mod- eling of real contaminants. In Table 5 the results of PLS2 modeling without the most polluted sites are presented (metal f(Fe, Mn)). The choice of two tracers improves substantially the prediction ability of the models. 4. Conclusion The high variability of the pollutant concentrations determined at various sampling sites and various en- vironmental phases require a very careful evaluation and interpretation. The application of dierent statis- tical methods is an ecient tool in achieving better understanding of the state of the environment. It seems very recommendable to combine various statistical approaches instead of relying only on one of them in order to gain better information of the system of interest or to try to predict its future trends. In the present case, the use of cluster and PCA make it possible to separate three zones of the marine envi- ronment with dierent levels of pollution by interpre- tation of the sediment analysis. Further, the extraction of latent factors (four in the present case) oers an opportunity of revealing the data structure and of separating natural from anthropogenic factors. Finally, the PLSs modeling gives a better opportunity in pre- dicting contaminant concentration on tracer (or trac- ers) element as compared to the one-dimensional approach of the baseline models. The results of the study are important not only in local aspect as they allow quick response in nding solutions and decision making but also in broader sense as a useful environ- metrical methodology. Acknowledgements One of the authors (V. Simeonov) would like to express his sincere gratitude to the Government of Brussels-Capital Region for the nancial support (Re- search in Brussels Grant 1999) which made this study possible. References Cantillo, A.Y., OConnor, T.P., 1992. Trace element contam- inants in sediments from the NOAA National Status and Trends Programme compared to data from throughout the world. Chem. Ecol. 7, 3150. Daskalakis, K.D., OConnor, T.P., 1995. Distribution of chemical concentrations in US coastal and estuarine sedi- ment. Mar. Environ. Res. 40, 389398. DeGroot, A.J., Salomons, W., Allersma, E., 1976. Processes aecting heavy metals in estuarine sediments. In: Burton, J.P., Liss, P.S. (Eds.), Estuarine Chemistry. Academic Press, New York. Draper, N.R., Smith, H., 1981. Applied Regression Analysis. Wiley, New York. Einax, J.W., Soldt, U., 1999. Geostatistical and multivariate statistical methods fot the assessment of polluted soils-merits and limitations. Chemomet. Intell. Lab. Syst. 46, 7991. Einax, J.W., Zwanziger, H., Gneiss, S., 1997. Chemometrics in Environmental Analysis. VCH, Weinheim. Environmental Protection Agency, 1992. In: Proceedings of the EPAs Contaminated Sediment Management Strategy Forum. Oce of water, EPA, Washington, DC. Esbensen, K., Schonkopf, S., Mitgaard, T., 1994. Multivariate Analysis in Practice. Trondheim. Forstner, U., Wittmann, G.T.W., 1981. Metal Pollution in the Aquatic Environment. Springer, New York. Hanson, P.J., Evans, D.W., Colby, D.R., 1993. Assessment of elemental contamination in estuarine and coastal environ- ments based on geochemical and statistical modeling of sediments. Mar. Environ. Res. 36, 237266. Table 5 PLS modeling using two tracers (Mn and Fe) for sediment data (except for 4 heavily polluted sites in Bourgas gulf) a Element Oset Slope RSMED r 2 Cu 36.38 0.11 31.66 0.27 Pb 14.01 0.13 24.05 0.13 Zn 33.69 0.41 37.46 0.39 Co 2.10 0.38 1.63 0.52 Cd 0.77 0.13 1.06 0.13 Cr 14.58 0.47 18.82 0.48 Ni 8.56 0.33 7.14 0.34 As 27.84 0.14 49.39 0.14 a For RSMED see note to Table 4. Table 4 PLS modeling using two tracers (Mn and Fe) for sediment data (all 39 sites included) a Element Oset Slope RSMED r 2 Cu 14.63 0.81 59.42 0.81 Pb 15.14 0.05 23.82 0.05 Zn 43.19 0.27 39.70 0.27 Co 0.15 0.98 1.68 0.98 Cd 0.72 0.11 1.04 0.10 Cr 18.05 0.30 21.17 0.29 Ni 11.30 0.13 7.88 0.13 As 26.63 0.10 49.21 0.09 a Relative standard mean error of determination (RSMED, Esbensen et al., 1994) is an indication for prediction error. 1416 V. Simeonov et al. / Chemosphere 41 (2000) 14111417 Loring, D.H., 1990. Lithium a new approach for the granulometric normalization of trace metal data. Mar. Chem. 29, 155168. Luoma, S.N., 1990. Processes aecting metal concentrations in estuarine and coastal marine sediments. In: Furnes, R.W., Rainbow, P.S. (Eds.), Heavy Metals in the Marine Envi- ronment. CRC Press, Boca Raton. Malinowski, E.R., 1991. Factor Analysis in Chemistry. Wiley, New York. Martens, H., Naes, T., 1991. Multivariate Calibration. Wiley, New York. Martin, J.M., Whiteld, M. 1983. The signicance of the river input of chemical elements to the ocean. In: Wong, C.S., Boyle, E., Bruland, K.W., Burton, J.D., Goldberg, E.D. (Eds.), Trace Metals in Sea Water. PlenumPress, NewYork. Massart, D.L., Kaufman, L., 1983. The interpretation of analytical chemical data by the use of cluster analysis. Wiley, New York. National Research Council, 1989. Contaminated Marine Sys- tems Assessment and Remediation. National Academy Press, Washington, DC. Shigenaka, G., Lauenstein, G.G., 1988. National Status and Trends Program for marine environment quality: Benthic surveillance and mussel watch projects sampling protocols. NOAA Memo NOS 40, NOAA oce of oceanography and marine assessment, Rockville. Simeonov, V., Andreev, G., 1989. Interpretation of Black sea sediment analytical data by the use of clustering approach. Tox. Environ. Chem. 24, 233240. V. Simeonov et al. / Chemosphere 41 (2000) 14111417 1417