# monotonic regression for assessment of trends in environmental quality data[精选推荐pdf]

European Congress on Computational s in Applied Sciences and Engineering ECCOMAS 2004 P. Neittaanmäki, T. Rossi, K. Majava, and O. Pironneau eds. V. Capasso and W. Jäger assoc. eds. Jyväskylä, 24-28 July 2004MONOTONIC REGRESSION FOR ASSESSMENT OF TRENDS IN ENVIRONMENTAL QUALITY DATAMohamed Hussian1, Anders Grimvall1, Oleg Burdakov1, and Oleg Sysoev21Department of Mathematics, Linköping University, SE-58183 Linköping, SwedenE-mail mohusmai.liu.se, angrimai.liu.se, olburmai.liu.se Web site www.mai.liu.se2Faculty of Control and Applied Mathematics, Moscow Institute of Physics and Technology,Institutskij per. 9, Dolgoprudnyj, Moscow region 141700, Russia E-mail osysoevmail.ruKeywords Monotonic regression, Response surface, Time series decomposition, NormalisationAbstract. Monotonic regression is a non-parametric that is designed especially for applications in which the expected value of a response variable increases or decreases in one or more explanatory variables. Here, we show how the recently developed generalised pool- adjacent-violators GPAV algorithm can greatly facilitate the assessment of trends in time series of environmental quality data. In particular, we present new s for simultaneous extraction of a monotonic trend and seasonal components and for normalisation of environmental quality data that are influenced by random variation in weather conditions or other s of natural variability. The general aim of normalisation is to clarify the human impact on the environment by suppressing irrelevant variation in the collected data. Our is designed for applications that satisfy the following conditions i the response variable under consideration is a monotonic function of one or more covariates; ii the anthropogenic temporal trend is either increasing or decreasing; iii the seasonal variation over a year can be defined by one increasing and one decreasing function. Theoretical descriptions of our ology are accompanied by examples of trend assessments of water quality data and normalisation of the mercury concentration in cod muscle in relation to the length of the analysed fish.Mohamed Hussian, Anders Grimvall, Oleg Burdakov, Oleg Sysoev21. INTRODUCTIONMonotonic responses and relationships are widespread in all types of environmental systems. For example, it is common that the rates of chemical and microbial processes increase with temperature. Also, the concentrations of many contaminants in living organisms increase with the age or size of the analysed individual, and fluxes of substances through terrestrial and aquatic systems can increase with the amount and intensity of precipitation. The simplest s of monotonic relationships can easily be described by using an appropriate parametric model, and numerous algorithms have been developed to fit such models to observed data. However, more complex relationships involving two or more explanatory variables can require non-parametric modelling. This is especially true if the response includes a threshold effect or is strongly non-linear in some other respect. Monotonic regression is a non-parametric that is designed for applications where the expected value of a response variable y increases or decreases in one or more explanatory variables x1, , xp. The most commonly used computational for this type of regression is the so-called pool-adjacent-violators PAV algorithm [1, 2, 3]. When p 1, this algorithm is computationally efficient, and it provides solutions that are optimal in the sense that the mean square error is minimised. When p 1, the PAV algorithm has proven useful for estimating monotonic responses to explanatory variables that are varied at only a few levels [4, 5, 6, 7]. However, it was not until Burdakov and colleagues [8, 9] recently generalised the PAV algorithm from fully to partially ordered data that it became feasible to handle typical regression data that include one or more continuous variables. The cited reports also explain the ways in which the generalised pool-adjacent-violators GPAV algorithm is superior to currently used algorithms that are based on simple averaging techniques [10, 11, 12] or quadratic programming [13, 14]. Here, we show that the GPAV algorithm has important applications in several areas of environmental science and management. In particular, we illustrate how this algorithm can be used in the following contextsi estimation of response surfaces that are known to be monotonic in two or more variables;ii simultaneous extraction of seasonal components and a monotonic trend from a univariate time series;iii normalisation of time series of environmental quality data.The first of these tasks is also highly relevant in many areas other than environmental science; for example, monotonic regression is often appropriate for estimating dose-response curves in experimental studies [6, 7]. The second task entails time series decomposition, which is a classical undertaking in official statistics e.g., [15]. The we present takes into account that many seasonal patterns in the environment can be decomposed into one increasing and one decreasing phase. The third task, normalisation or adjustment, aims toMohamed Hussian, Anders Grimvall, Oleg Burdakov, Oleg Sysoev3clarify the human impact on the environment by removing weather-dependent fluctuations or other natural variability in the collected data [16].2. ESTIMATION OF A MONOTONIC RESPONSE IN TWO OR MORE EXPLANATORY VARIABLESAs we already pointed out, the GPAV algorithm is particularly useful when the expected response is monotonic in two or more explanatory variables and at least one of these variables is continuous. Such situations arise naturally when uating time series of environmental quality data for temporal trends. First, interest is often focused on monotonic trends. Second, almost all measurements of the state of the environment are influenced by weather conditions or other covariates, and it is more the rule than the exception that the relationships between the response variable and the covariates under consideration are monotonic. Figure 1 a and b illustrates how monotonic regression can be used to describe data on the concentration of mercury in Atlantic cod Gadu morhua in relation to sampling year and body length. The increase in mercury with increasing size of the fish is obvious in the two diagrams. In addition, the response surface in Figure 1b indicates a downward temporal trend.Figure 1a. Concentration of mercury in muscle tissue from Atlantic cod Gadu morhua caught in the North Sea 53o 10’ N, 2o 5’ E. The data represent observed concentrations ng Hg/g ww in relation to sampling year and body length length of the analysed fish.Concentration ng Hg/g wwYearFish length cmMohamed Hussian, Anders Grimvall, Oleg Burdakov, Oleg Sysoev4Figure 1b. Concentration of mercury in muscle tissue from Atlantic cod Gadu morhua caught in the North Sea 53o 10’ N, 2o 5’ E. The response surface was obtained by first using the GPAV algorithm for monotonic regression and then employing locally weighted scatter-plot smoothing to extrapolate the fitted regression values to a fine grid see also section 4.3.3. SIMULTANEOUS ESTIMATION OF A MONOTONIC TREND AND SEASONAL EFFECTSWhen data are collected over several seasons, monotonic regression models may appear to be inadequate. However, many seasonal patterns can be decomposed into increasing and decreasing phases, and this enables the use of various approaches based on monotonic regression. If we let y1, y2, , yn denote a time series of data collected over m seasons, and letiy ˆdenote the sum of the trend and seasonal components at time i, it is possible to determineiy ˆby minimising∑−iiiyyS2ˆunder a set of simple constraints, and we can also introduce these constraints by employing a monotonic regression model. Let us, for the sake of clarity, consider a seasonal pattern of the type illustrated in Figure 2. Let us also assume that we would like to extract a non-increasing trend function from the collected data. If we then per a monotonic regression using sampling year and the variables x1 and x2 as explanatory variables, the fitted values iy ˆmust be non-increasing for each season, i.e.,Concentration ng Hg/g wwYearFish length cmMohamed Hussian, Anders Grimvall, Oleg Burdakov, Oleg Sysoev5.,..., 1,ˆˆmniyymii−≥In addition, the fitted values representing different seasons in the same year must have non- increasing and non-decreasing phases with the same duration as in Figure 2.Figure 2. Seasonal pattern comprising increasing and decreasing phases, and a possible coding of these phases.Figure 3 a and b illustrates a set of monthly flow-weighted concentrations of total nitrogen in the Elbe River and the monotonic trend and seasonal components that could be extracted from these data. The goodness-of-fit to observed data reached a maximum when we let the seasonal effects have a maximum in March and a minimum in August. Furthermore, it can be noted that the downward trend was particularly strong after the reunification of Germany in 1990.Figure 3a. Monthly mean concentrations of total nitrogen Tot-N measured in the Elbe River at Brunsbüttel downstream of Hamburg.Fitted Tot-N conc. mg/lFitted Tot-N conc. mg/lYearYearMonthMonth-0.30-0.25-0.20-0.15-0.10-0.050.000.050.100.150.200.25Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecSeasonal effectM onthx1x2Jan00Feb10M ar20Apr2-1M ay2-2Jun2-3Jul2-4Aug2-5Sep3-5Oct4-5Nov5-5Dec6-5Mohamed Hussian, Anders Grimvall, Oleg Burdakov, Oleg Sysoev6Figure 3b. Response surface obtained by applying monotonic regression to monthly mean concentrations of total nitrogen Tot-N measured in the Elbe River at Brunsbüttel downstream of Hamburg.4. NORMALISATION OF TIME SERIES OF ENVIRONMENTAL QUALITY DATA4.1 Normalisation ulaeThe general aim of normalisation is to remove irrelevant variation in the collected data. The basic idea is simple. If observations of meteorological or other naturally fluctuating variables makes us believe that the studied response variable takes a value that is c units higher than the average response, then normalisation implies that we subtract this expected increase c from the observed response. A general probabilistic framework for normalisation was recently presented by Grimvall and co-workers [16]. Here, we discuss normalisation based on monotonic regression models. Let us assume that the observed values of the response variable y have the general nixxfyipiii,..., 1,,...,1εwhere f is a deterministic function of p explanatory variables x1, , xp, and εi , i 1, , n depicts a sequence of independent, identically distributed random errors with mean zero. We can then normalise the observed responses with respect to x1, , xq by ing ,...,,,...,ˆ,...,,,...,ˆ , 11, 11piqqiipiqqiiiixxxxfxxxxfyy−−Fitted Tot-N conc. mg/lYearMonthMohamed Hussian, Anders Grimvall, Oleg Burdakov, Oleg Sysoev7where fˆ denotes an estimate of f and nixxqii,...,1,...,,1 is a sequence of given values of x1, , xq. Typically, these given values represent averages taken over the entire data set or subsets thereof. For example, if data are collected over several seasons, we can let nixxqii,..., 1,,...,1 represent seasonal means of x1, , xq. Regardless of how the normalisation is carried out, we must be able to estimate the values of f at two sets of points an estimation set {}nixxApii,..., 1,,...,1for which we have observed response values {yi, i 1, , n}, and an uation set {}mnnixxxxBpiiqqii,..., 1,,...,,,...,, 11for which no observations exist. The GPAV algorithm provides estimates of f for all points in the estimation set. It remains to extrapolate fˆto the uation set under the constraint thatfˆis monotonic in each of the coordinates.4.2 Extrapolation of monotonic responses to new pointsLet χ χ1, , χp be a point to which fˆshall be extrapolated from a given estimation set A. We can then define two subsets of A. The first subset {}pkxAxxLkkipii,..., 1,;,...,1≤∈χχcontains all points in A that are dominated by χ. The second subset {}pkxAxxUkkipii,..., 1,;,...,1≥∈χχcomprises all points in A that dominate χ. Let us also for the moment assume that both Lχ and Uχ are nonempty. Then the expression {}χLxxxxfypiipiiL∈,...,;,...,ˆmax11provides a lower limit for the values of ˆχf that can render fˆmonotonic on the set { }χ∪A. Furthermore, we can identify a point χχLL∈that minimises the distance to χ underthe constraint that LLyfˆχ. Likewise,{}χUxxxxfypiipiiU∈,...,;,...,ˆmin11defines an upper limit for the permissible values ofˆχf, and we can select a point χχUU∈that minimises the distance to χ under the constraint that .ˆ UUyfχ If χ is on the straight line between χL and χU, it would be natural to use linear interpolation to assign a value to ˆχf, in other words to setMohamed Hussian, Anders Grimvall, Oleg Burdakov, Oleg Sysoev8ˆ LU LUL Lyyyf−−−χχχχχ.where udenotes the length of the vector u. Regardless of the location of χ we can set ,,ˆUUU LU LLLL Uyyyf−−−−−χχχχχχχχχwhere u,v denotes the scalar product of the vectors u and v. If Lx or Ux is empty, we assign values to ˆχf as follows emptyareandbothif,nonemptyisandemptyisif,nonempty isandemptyisif,ˆχχχχχχULyLUyULyfLUwhere y is the mean response for the elements in the entire estimation set or a subset of elements within a fixed distance to χ. The procedure described above can be repeated for an arbitrary set of points. However, it is important to note that the estimation set A must be updated from A to { }χ∪A each time fˆ has been extrapolated to a new point χ. Otherwise, there may be pairs of extrapolated values for which the monotonicity is violated. If the uation set is large, the above-mentioned procedure can be computationally cumbersome. Hence there is also a need for extrapolation procedures that can provide an approximately monotonic response surface over a large set of points. For example, it can be convenient to use kernel smoothing or locally weighted scatter-plot smoothing [17] to extrapolate the fitted responses in a monotonic regression to a fine grid of values of the explanatory variables see Figure 1b.4.3 Normalisation of contaminants in fishSimple time series plots of observed concentrations of mercury in Atlantic cod caught in the middle of the North Sea 53o 10’ N, 2o 5’ E indicate a downward trend Figure 4a. However, this may, at least in part, be a spurious trend caused by temporal changes in the lengths of the analysed fish. Hence, it is of great interest to normalise the observed mercury concentrations with respect to fish length. Figure 4b illustrates the results obtained by using the normalisation procedure described in sections 4.1 and 4.2. Apparently, the mercury trend after normalisation