Bénédicte Fontez
June 2019
A random process, RP \(Z_s\),is defined as a set of usually dependent random variables \(Z(s)\), one for each site \(s\) in the study area \(\Gamma\).
\[RP \; Z_s =\{ Z(s), \forall s \in \Gamma \}\] To any set of \(N\) sites correspond a vector of \(N\) random variates, whose probability (or multivariate cdf) is: \[ F(Z(s_{(1)}),\ldots,Z(s_{(N)})) = Prob(Z(s_{(1)})\leq z_1, \ldots, Z(s_{(N)}) \leq z_N)\] The set of all such N-multivariate cdf for any positive integer \(N\) constitutes the spatial law of RP \(Z_s\).
In practice, the analysis will be limited to no more than two sites at a time:
Covariance: \(C(s_i,s_j) = E[Z(s_i)Z(s_j)] - E[Z(s_i)]E[Z(s_j)]\)
Variogram: \(2\gamma(s_i,s_j) = Var[Z(s_i)-Z(s_j)]\)
Local Estimation
The variable of interest \(Z(s)\) was measured on six different sites \(\{s_A,s_B,s_C, s_D, s_E, s_F\}\).
What is the value at a new site?
The IDW or Inverse Distance Weighted interpolation gives an estimation of \(Z(s_{new})\) build as a weighted linear combination of the neighborhood values. The weight is:
Several definitions exist, among them:
\(N_{new}\) will denote the number of sites in the \(Neighborhood(S_{new})\) in the following.
Each site \(s_i\) of the neighborhood has a weight inversely proportional to the distance between (\(s_i\)) and the site to predict/estimate (\(s_{new}\)):
\[\hat{Z}(s_{new}) = \frac{\sum_{i=1}^{N_{new}} \frac{Z(s_i)}{dist(s_i, s_{new})}}{\sum_{i=1}^{N_{new}} \frac{1}{dist(s_i, s_{new})}}\]
A broader definition uses a power function of the distance between \(s_i\) and \(s_{new}\):
\[\hat{Z}(s_{new}) = \frac{\sum_{i=1}^{N_{new}} \frac{Z(s_i)}{dist(s_i, s_{new})^P}}{\sum_{i=1}^{N_{new}} \frac{1}{dist(s_i, s_{new})^P}}\]
-The IDW interpolation is an exact estimation. It gives the observed values as estimated for the sampled/observed sites.
The interpolation surface is continuous/smooth
The interpolation does not depend on the site configuration but on the distances between sites.
Dataset from the Spatial Interpolation Comparison exercise 1997 (SIC97) Reference: Journal of Geographic Information and Decision Analysis, vol.2, no.2, pp. 1-11 (1997)
467 daily rainfall measurements made in Switzerland on the 8th of May 1986 were used in SIC97 (sic_full).
From them, 100 observed data (sic_obs) were used to estimate the rainfall at the remaining 367 locations.
data(sic97)
mydata<-st_as_sf(sic_full)
mydataobs<-st_as_sf(sic_obs)
mydata$obs<- mydata$ID %in% mydataobs$ID
crs.suisse<-"+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +ellps=bessel +towgs84=674.4,15.1,405.3,0,0,0,0 +units=m +no_defs +x_0=-55000 +y_0=14000"
st_crs(mydata)<-crs.suisse
myobs<-mydata[mydata$obs,]
mapview(mydata,zcol="rainfall")+mapview(myobs, col.regions="red", alpha.regions=0.5)
chull<-mydata %>% st_union() %>% st_convex_hull()
mydata.points<-mydata %>% st_make_grid(n=100) %>% st_intersection(chull) %>% st_centroid()
mydata.grid<-mydata %>% st_make_grid(n=100) %>% st_intersection(chull)
mapview(mydata,zcol="rainfall")+mapview(mydata.grid, col.regions="red")
# IDW interpolation (needs package stars)
obs_idw <- idw(rainfall~1,locations=myobs,newdata=st_sf(mydata.points))
## [inverse distance weighted interpolation]
st_geometry(obs_idw)<-st_geometry(mydata.grid) # to assign polygons
#obs_idw <- idw(rainfall~1,locations=myobs,mydata)
summary(obs_idw)
## var1.pred var1.var geometry
## Min. : 13.97 Min. : NA POLYGON :7169
## 1st Qu.:136.90 1st Qu.: NA epsg:NA : 0
## Median :173.92 Median : NA +proj=some...: 0
## Mean :186.25 Mean :NaN
## 3rd Qu.:227.96 3rd Qu.: NA
## Max. :566.62 Max. : NA
## NA's :7169
full_idw <- idw(rainfall~1,locations=mydata,newdata=st_sf(mydata.points))
## [inverse distance weighted interpolation]
#obs_idw["var1.pred"] %>% as_Spatial() %>% spplot()
mapview(obs_idw["var1.pred"], zcol="var1.pred",lwd=0)+mapview(myobs,zcol="rainfall")
spplot(as_Spatial(obs_idw["var1.pred"]))
spplot(as_Spatial(full_idw["var1.pred"]))
Principle:
Characterize the spatial structure of the variable studied by a variogram
Construct a linear combination that best predicts/estimates the value at a point by taking into account the correlations between points in the neighborhood
Quantify uncertainty related to prediction (variance of the prediction/estimation)
The variogram is a method which measures the average ‘pattern dissimilarity’ between two samples according to their distance.
The experimental variogram is computed from the data and adjusted with a mathematical model defined by three parameters.
The nugget: the variance between two points at distances smaller than the shortest sampling interval. This variance is due to the measurement error or to the spatial discontinuity.
The range: distance above which sites are non correlated.
The sill: The variance between 2 non correlated sites.
The variogram shape is also determined by i) the slope or speed at which destructuring occurs according to distance, and ii) the direction in which the variogram was computed/constructed (depends on the spatial anisotropy).
Variogram model parameters
Each point of the experimental semivariogram represents half the variance between two sites separated by a specified distance. As the distance increases the variability between sites is assumed to increase.
\[2 \hat{\gamma}(h) = \frac{1}{N_h} \sum_{i,j \in Neigh(h)} (Z_{s_i} - Z_{s_j})^2\]
where \(Neigh(h)=\{(i,j), \mid si-sj \mid = \mid h \mid \}\) is the set of pairs of points separated by a distance \(h\) and \(N_h\) is the number of elements in \(Neigh(h)\).
Practical aspect: Cut the interval of observed distances ([h_{min},h_{max}]) into bins of the same length.
The experimental variogram is sensitive to the choice of breaks and to \(h_{max}\). Often \(h_{max}\) is not the maximum distance but half its value to prevent border effects in the experimental variogram.
Eventually, the experimental variogram can be computed along one or several direction(s), when the processus \(Z_s\) is spatially anisotropic.
# package gstat
v <- variogram(rainfall ~ 1, data= myobs)
v
## np dist gamma dir.hor dir.ver id
## 1 15 5078.697 554.700 0 0 var1
## 2 68 11926.084 3190.882 0 0 var1
## 3 111 19714.898 3683.126 0 0 var1
## 4 132 27743.181 8626.913 0 0 var1
## 5 142 35528.553 8879.391 0 0 var1
## 6 191 42984.622 11295.016 0 0 var1
## 7 172 50941.385 13502.174 0 0 var1
## 8 211 58613.468 15434.417 0 0 var1
## 9 229 66349.844 14101.290 0 0 var1
## 10 229 74535.224 16060.395 0 0 var1
## 11 225 82127.807 16137.349 0 0 var1
## 12 249 90317.707 14494.484 0 0 var1
## 13 240 97924.235 17336.248 0 0 var1
## 14 281 105896.406 13148.614 0 0 var1
## 15 256 113440.560 10941.543 0 0 var1
plot(v)
Semivariogram obtained with myobs dataset
To choose the best model between the list of possible models, a visual inspection is often enough but some statistical criteria like AIC or the weighted Sum of Squares (WSS) are also used.
m.fit <- fit.variogram(v, vgm("Sph"))
m.fit
## model psill range
## 1 Nug 0.00 0.00
## 2 Sph 15292.38 82946.36
plot(v,m.fit)
Comments on the variogram: For adjacent sites, the half variance (or semivariance) is around 0 and not null. This value is called the nugget effect. For a distance above 8.29463610^{4} the semivariance is stabilized and has attained a sill of 1.52923810^{4}.
The RP \(Z_s\) is assumed stationary to use variogram or kriging statistical tools.
A sufficient condition for the existence of the variogram is the intrinsic stationarity. Often, the RP is assumed to be stationary of order 2 (which implies the intrinsic stationarity).
To go further, mathematical model: The RP \(Z_s\) is usually decomposed into a residual component (random process \(R_s\)) and a trend component (deterministic function \(m(s)\)): \[Z(s_i) = R(s_i) + m(s_i) \] The RP \(R_s\) is assumed stationary of order 2 with zero mean value.
Three kriging variants can be distinguished according to the function \(m(s)\) used for the trend.
Estimator (definition and properties)
The value at a new site \(\hat{Z}(s_{new})\) is estimated from the samples in its neighborhood \(Z(s_1), Z(s_2), \ldots, Z(s_{N_{new}})\) by a linear combination:
\[\hat{Z}(s_{new}) = \lambda_1(s_{new}) \, Z(s_1) + \lambda_2(s_{new}) \, Z(s_2) +\ldots+ \lambda_{N_{new}}(s_{new}) \, Z(s_{N_{new}})\]
where \(\lambda_i(s_{new})\), \(1 \le i \le N_{new}\), are determined by solving the system of equations corresponding to these two assumptions:
The first assumption implies the following constraint: \(\sum_{i=1}^{N_{new}} \lambda_i(s_{new}) =1\)
Each predictor \(\hat{Z}(s_{new})\) is a random variable with a variance, often called the kriging variance.This variance depends only on the model variogram and on the spatial pattern of the sites (those observed and those to predict). Therefore this variance does not depend on the observed values \(Z(s_1), Z(s_2), \ldots, Z(s_{N_{new}})\). The kriging variance is optimal for a Gaussian RP \(Z_s\).
To go further the ordinary kriging system:
Remember that \(\forall s_j \in Neighborhood(s_{new}), \, m(s_j) = m(s_{new})\), so: \[\begin{eqnarray*} \hat{Z}(s_{new}) &=& \sum_{i=1}^{N_{new}} \lambda_i(s_{new})\, (Z(s_i) - m(s_{new})) + m(s_{new}) \\ &=& \sum_{i=1}^{N_{new}} \lambda_i(s_{new})\, Z(s_i) + [1-\sum_{i=1}^{N_{new}} \lambda_i(s_{new})]m(s_{new}) \end{eqnarray*}\] The unknown local mean value is filtered from the linear estimator when the kriging weights sum to 1.
The variance of the estimator is: \[\begin{eqnarray*} var[\hat{Z}(s_{new}) - Z(s_{new})] &=& \sum_{i=1}^{N_{new}} \sum_{j=1}^{N_{new}} \lambda_i(s_{new}) \, \lambda_j(s_{new})\, C(s_i,s_j) + C(0) \\ &-& 2 \sum_{i=1}^{N_{new}} \lambda_i(s_{new})\, C(s_i,s_{new}) \end{eqnarray*}\] The minimization of this error variance under the non bias condition leads to the following system of linear equations:
NF.kriged = krige(rainfall ~ 1, locations=myobs, newdata=mydata.points, model = m.fit)
## [using ordinary kriging]
plot(NF.kriged)
Cross validation functions for simple, ordinary or universal point (co)kriging, kriging in a local neighbourhood.
Leave-one-out cross validation (LOOCV) visits a data point, and predicts the value at that location by leaving out the observed value, and proceeds with the next data point. (The observed value is left out because kriging would otherwise predict the value itself.) N-fold cross validation makes a partitions the data set in N parts. For all observation in a part, predictions are made based on the remaining N-1 parts; this is repeated for each of the N parts. N-fold cross validation may be faster than LOOCV.
NF.kriged.cv = krige.cv(rainfall ~ 1, locations=myobs, model = m.fit, nmax=20, nfold=5)
bubble(as_Spatial(NF.kriged.cv), "residual", main = "rainfall: 5-fold CV residuals")
#sic_nobs <- sic_full[-(1:100),]
NF.kriged.nobs = krige(rainfall ~ 1, myobs, model = m.fit, newdata=mydata[!mydata$obs, ],nmax=20)
## [using ordinary kriging]
NF.kriged.nobs$residuals <- mydata[!mydata$obs,]$rainfall-NF.kriged.nobs$var1.pred
bubble(as_Spatial(NF.kriged.nobs), "residuals", main = "rainfall: residuals")
summary(data.frame(obs=mydata[!mydata$obs,]$rainfall,pred=NF.kriged.nobs$var1.pred))
## obs pred
## Min. : 0.0 Min. : -1.695
## 1st Qu.:100.0 1st Qu.:111.374
## Median :162.0 Median :165.489
## Mean :185.4 Mean :182.518
## 3rd Qu.:263.5 3rd Qu.:251.753
## Max. :517.0 Max. :487.654
par(mfrow=c(1,2))
hist(mydata[!mydata$obs,]$rainfall)
hist(NF.kriged.nobs$var1.pred)
Kriging gives a smooth prediction with less dispersion of the rainfall values.
#st_data<-function(SF){st_geometry(SF)<-NULL; return(SF)}
NF.kriged.nobs$rainfall<-mydata[!mydata$obs,]$rainfall
cor(NF.kriged.nobs$rainfall,NF.kriged.nobs$var1.pred)
## [1] 0.8657555
var(NF.kriged.nobs$residuals)
## [1] 3095.841
coef <- coef(lm(var1.pred~rainfall,data=NF.kriged.nobs))
ggplot(NF.kriged.nobs,aes(x=rainfall,y=var1.pred)) +
geom_point() +
geom_abline(slope =1, intercept = 0, col="red",size=2) +
geom_abline(slope =coef[2], intercept = coef[1], col="green",size=2)
The neighborhood can be defined in the same way for kriging and IDW. But:
IDW: each site has a weight inversely proportional to the distance to the site to predict (\(s_{new}\)).
Kriging: The weighting is built through the variogram model and the spatial pattern of the sites.
For both interpolators, the value on a new site is estimated by a weighted linear combination of the neighbors sites.
Universal Kriging assumes a linear or quadratic trend (where spatial coordinates could be used as explanatory variables). \[Z(s_i) = m(s_i) + R(s_i)\] with
Linear trend \(m(s_i)=\beta_0 + \beta_1 x_i + \beta_2 y_i\)
Quadratic trend \(m(s_i)=\beta_0 + \beta_1 x_i + \beta_2 y_i + \beta_3 x_i^2 + \beta_4 y_i^2 + \beta_5 x_i \, y_i\)
In the rainfall example, the universal kriging brings no improvement.
st_add_xy<-function(SF){xy<-SF %>% st_centroid() %>% st_coordinates(); SF$x<-xy[,1]; SF$y<-xy[,2]; return(SF)}
NF.kriged.UK <- krige(rainfall ~ x+y, locations=st_add_xy(myobs), model = m.fit, newdata=st_add_xy(mydata[!mydata$obs,]),nmax=20)
## [using universal kriging]
residuals <- mydata[!mydata$obs,]$rainfall-NF.kriged.UK$var1.pred
NF.kriged.UK$residuals<-residuals
bubble(as_Spatial(NF.kriged.UK), "residuals", main = "rainfall: residuals from Universal Kriging")
Practical recommendation:
Do a Universal kriging or Estimating the trend and computing simple kriging predictions of the residuals is equivalent to Universal Kriging when a linear trend is assumed (Cressie, 1993, section 3.4.5).So:
Goal: Estimate a characteristic or parameter of the RP \(Z_s\), for example a probability map.
Principle
kriging gives an estimate of both the mean value and standard deviation of the normal (Gaussian) variable at each grid node.
Sequential Gaussian Simulation replaces the kriging mean value by a random draw from this normal distribution.
More details can be found in the book from Goaverts
Normal Score Transformation for the Rainfall Example
When data are not Normally distributed, the data can be transformed into normal scores before doing the sequential Gaussian simulation on the normal scores. First, the rainfall dataset is normalized (centered and divided by its standard deviation).
We can see from figure that a rainfall of \(210\) millimeters corresponds to a scaled value of \((210 - 184)/112 = 0.23\) and a probability of \(0.62\). The normal quantile (or Normal Scaled Score) for this probability is \(0.30\). This empirical quantile (\(210\)) to normal quantile (\(0.30\)) transformation preserves the rank of an observation and therefore the probability level.
Illustration of the Normal Score Transformation for the Rainfall dataset
Algorithm of the SGS
Transform data to normal scores
Perform a variogram analysis on the normal scores
Create a grid and generate a random path through the grid nodes
Use kriging to estimate a mean value and standard deviation at the first node
Set the variable value at that node from the random draw
Repeat for next nodes, including previously simulated nodes as data values in the kriging process
The previously simulated grid nodes are included as data in order to preserve the proper covariance structure between the simulated values.
NF.kriged.sim = krige(rainfall ~ 1, locations=mydata, newdata= mydata.points, model = m.fit, nmax=20, nsim=4)
## drawing 4 GLS realisations of beta...
## [using conditional Gaussian simulation]
spplot(as_Spatial(NF.kriged.sim), main = "four conditional simulations")
When non exhaustive secondary information is available, it can be incorporated into the estimator using cokriging approach. This approach takes into account the secondary variables (from the RP \(Z_{j,s} \, N_j, \, j = 2,...,p\)) and their spatial cross correlation with the primary variable (from the RP \(Z_{1,s}\)). The secondary data can be possibly at different sites.
All cokriging estimators required to be unbiased \(E[\hat{Z}_1(s_{new}) - Z_1(s_{new})]=0\) and to minimize the error variance \(Var[\hat{Z}_1(s_{new}) - Z_1(s_{new})]\).
Each RP \(Z_{j,s}\) is decomposed into a residual and a trend components: \[Z_{j,s} = R_{j,s} + m_j(s), \, j=1,\ldots,p\]
The residual component \(R_{j,s}\) is modeled as a stationary RP with zero mean value and:
Covariance function: \(Cov[R_j(s),R_j(s+h)] = C_j(h)\)
Cross covariance function: \(Cov[R_j(s),R_k(s+h)] = C_{jk}(h)\)
The data used in this tutorial comes from the article Kriging in estuaries: as the crow flies, or as the fish swims?, Laurie S. Little, Don Edwards, Dwayne E. Porter, Journal of Experimental Marine Biology and Ecology,213 (1997) 1-11.
Their study evaluated the relative accuracy of eight kriging methods for predicting contaminant and water quality variables measured in an urbanized estuary in South Carolina. The variable of interest was Fluoranthene concentration in oyster tissue was normalized by dividing by the proportion of lipid in tissue, as determined by the proportion of methylene chloride extracted solids.
Load package gstat Load package sp
VARIABLE OF INTEREST : don$NormFluo or don[,11]
ISOTROPY:
ANISOTROPY:
Remark: to get the list of all possible models, write: vgm()
Try this command: anis.vgm = vgm(psill=10000,model=“Sph”, range=6000,nugget=10, anis = c(60, 0.2))
Draw the experimental and this spherical variograms on the same graph, comment
Apply the automatic fitting: mod.vgma <- fit.variogram(obs.vgma,model=anis.vgm)
Draw experimental and estimated spherical variograms on the same graph, comment