Introduction of meuse dataset

The Meuse dataset we used here is included in sp package in R. This data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variables at the observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from composite samples of an area of approximately 15m \(\times\) 15m.

library(lattice)
library(sp)
## Warning: package 'sp' was built under R version 3.3.3
data(meuse)
coordinates(meuse) <- c("x", "y")

Here we focus on predicting the topsoil zinc concentration (\(mg\cdot kg^{-1}\)).

spplot(meuse, "zinc", do.log = T, colorkey = TRUE)

bubble(meuse, "zinc", do.log = T, key.space = "bottom")

Non-geostatistical approaches

We can predict the zinc concentration using non-geostatistical methods, such as Inverse Distance Weighted Interpolation (IDW): \[ \hat{Z}(s_0)=\frac{\sum_{i=1}^nw(s_i)Z(s_i)}{\sum_{i=1}^nw(s_i)}, \] where weights \(w(s_i)\) are computed according to their distance to the interpolation location \(s_0\), \[w(s_i)=||s_i-s_0||^{-p}\] with \(||\cdot||\) as Euclidean distance. The idp parameter in function idw is the inverse (Euclidean) distance weighting power \(p\). Large value of idp is similar to small value of k in k-nearest neighbors (KNN).

data(meuse.grid)
coordinates(meuse.grid) <- c("x","y")
meuse.grid <- as(meuse.grid,"SpatialPixelsDataFrame")
library(gstat)
## Warning: package 'gstat' was built under R version 3.3.3
idw.out <- idw(zinc~1,meuse,meuse.grid,idp=2)
## [inverse distance weighted interpolation]
as.data.frame(idw.out)[1:5,]
##        x      y var1.pred var1.var
## 1 181180 333740  633.6864       NA
## 2 181140 333700  712.5450       NA
## 3 181180 333700  654.1617       NA
## 4 181220 333700  604.4422       NA
## 5 181100 333660  857.2558       NA

In practice, IDW interpolation (with small \(p\)) often results in maps that are close to kriged maps when a variogram is isotropic with no or small nugget value (i.e. strong spatial autocorrrelation). Since IDW only use distance information, it ignores the spatial configuration of the samples (e.g. an anisotropic variogram). In addition, since the weights are always between 0 and 1, interpolated values never fall outside the range of data.

We can also predict the zinc concentration using linear regression models. One of the important predictors is the distance to the Meuse river. The figure below shows the relationship between the logarithm of topsoil zinc concentration and the distance to the Meuse river.

xyplot(log(zinc) ~ sqrt(dist), as.data.frame(meuse))

Even after removing the trend from variable dist, there is still some spatial autocorrelation left on the residuals (e.g. neighboring points tends to have similar residual values.)

zn.lm <- lm(log(zinc) ~ sqrt(dist), meuse)
meuse$fitted.s <- predict(zn.lm, meuse) - mean(predict(zn.lm,meuse))
meuse$residuals <- residuals(zn.lm)
spplot(meuse, c("fitted.s", "residuals"))

Estimate spatial correlation: variogram

The spatial correlatin can be observed by looking at the lagged scatter plot. When lag is between 100 meters, the correlation is quite strong (i.e. correlation is 0.722). However, the spatial correlation becomes very close to 0 when lagged distance is beyond 500 meters.

hscat(log(zinc)~1,meuse,(0:9)*100)

Experimental variogram is shown below. The \(\sim1\) defines a single constant mean coefficient model: \(Z(s)=\mu+\epsilon(s)\), where \(s\) is the location.

plot(variogram(log(zinc)~1,meuse),type="l")

I randomly allocate the zinc concentration to different locations and see the experimental variogram, where this should be no spatial correlation. We see the variogram is flat.

dat <- meuse
n   <- nrow(meuse)
set.seed(11)
ind <- sample(1:n,size=n,replace=F)
dat$zinc <- meuse$zinc[ind]
plot(variogram(log(zinc)~1,dat),type="l")

The variogram above uses the default settings for several parameters in the variogram. For example, it ignores the direction effects (i.e. isotropic). The data points are merged on the basis of distance, without considering their directions. We can add direction component into the variogram as follows. Note that 0 is north and 90 is east. As the sample size decreases for each direction, the sample variogram becomes more variable.

plot(variogram(log(zinc)~1,meuse,alpha=c(0,45,90,135)),type="l")

Robust estimator of variogram

We injected some errors on 5 out of the 155 samples in the meuse data set and see the change of variogram and its robust estimator using Cressie-Hawkins estimator.

dat <- meuse
summary(meuse$zinc)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   113.0   198.0   326.0   469.7   674.5  1839.0
set.seed(1)
ind <- sample(1:155,size=5,replace=F)
dat$zinc[ind] <- meuse$zinc[ind]+5000

v.0 <- variogram(log(zinc)~1,meuse)
v.1 <- variogram(log(zinc)~1,dat)
v.2 <- variogram(log(zinc)~1,dat,cressie=T)

plot(v.0$dist,v.0$gamma,ylim=range(v.0$gamma,v.1$gamma,v.2$gamma),xlab="Distance",ylab="Semivariance",type="l")
lines(v.1$dist,v.1$gamma,lty=2,col="red",lwd=2)
lines(v.2$dist,v.2$gamma,lty=3,col="blue",lwd=2)
legend("bottomright",legend=c("Matheron: original","Matheron: corrupted","Cressie: corrupted"),lty=1:3,col=c(1,"red","blue"))

Modelling variogram

We can see the basic variogram available in gstat package.

show.vgms()

vgm()
##    short                                      long
## 1    Nug                              Nug (nugget)
## 2    Exp                         Exp (exponential)
## 3    Sph                           Sph (spherical)
## 4    Gau                            Gau (gaussian)
## 5    Exc        Exclass (Exponential class/stable)
## 6    Mat                              Mat (Matern)
## 7    Ste Mat (Matern, M. Stein's parameterization)
## 8    Cir                            Cir (circular)
## 9    Lin                              Lin (linear)
## 10   Bes                              Bes (bessel)
## 11   Pen                      Pen (pentaspherical)
## 12   Per                            Per (periodic)
## 13   Wav                                Wav (wave)
## 14   Hol                                Hol (hole)
## 15   Log                         Log (logarithmic)
## 16   Pow                               Pow (power)
## 17   Spl                              Spl (spline)
## 18   Leg                            Leg (Legendre)
## 19   Err                   Err (Measurement error)
## 20   Int                           Int (Intercept)

We can fit a sample variogram for a particular model or even a set of variogram models.

v <- variogram(log(zinc)~1,meuse)
fit.variogram(v, vgm("Sph"))
##   model      psill    range
## 1   Nug 0.05065971   0.0000
## 2   Sph 0.59060511 897.0011
fit.variogram(v, vgm(c("Exp", "Sph")))
##   model      psill    range
## 1   Nug 0.05065971   0.0000
## 2   Sph 0.59060511 897.0011

We see the spherical model with nugget=0.051, sill=0.591 and range=897 is chosen. We can plot the sample variogram and fitted model together.

v2 <- vgm(0.591,"Sph",897,0.051)
plot(v,v2)

More complex variogram modelling

Since distance to the Meuse rive (variable dist) is an important factor for zinc concentration, we may consider to model the spatial correlation after removing the trends from dist. The following model fits: \[ \log(Z(s))=\beta_0+\beta_1\sqrt{dist}+\epsilon(s) \]

v3 <- variogram(log(zinc)~sqrt(dist),meuse)
fit.variogram(v3,vgm(c("Exp", "Sph"))) #adjusted by distance effect
##   model      psill    range
## 1   Nug 0.07981792   0.0000
## 2   Sph 0.14905756 872.7236
fit.variogram(v, vgm(c("Exp", "Sph"))) #without adjustment
##   model      psill    range
## 1   Nug 0.05065971   0.0000
## 2   Sph 0.59060511 897.0011

After adjusting the distance effect, we see the sill size is substantially reduced, while the range doesn’t change much.

v4 <- vgm(0.15,"Sph",873,0.08)
plot(v3,v4)

Spatial prediction through kriging

Kriging is used to estimate the values at those locations which have not been sampled. There are several types of krigings, but the following five are commonly used.

  1. The basic technique ordinary kriging (OK) uses a weighted average of neighboring samples to estimate the unknown value at a given location. The random process model for OK is: \[Z(s)=\mu+\epsilon(s)\] with a constant (but unknown) mean \(\mu\) and a variogram \(\gamma(h)\). It is known that OK estimate is BLUP (best linear unbiased prediction). Linear means the estimates are weighted linear combinations of the available data. Unbiased means it tries to have the mean residual equal to 0. Best means it minimizes the error variance, which is the distinguishing feature for kriging.

  2. The simple kriging (SK) is almost the same as OK, except the mean \(\mu\) is known in advance.

  3. The kriging with a trend is usually called universal kriging (UK), where the underlying random process model is: \[Z(s)=f(s)+\epsilon(s)\] with the trend function \(f(s)\) (also called external drift.

  4. Cokriging extends the kriging prediction to multivariate situation (i.e. \(Z\) represents a vector with more than one variables).

  5. By default, \(Z(s)\) is treated as being observed on point location. However, block kriging predicts the averages of larger areas or volumnes instead of a single point.

Like many local estimators in statistics and machine learning (e.g. Loess smoother and \(k\)-nearest neighbors), Kriging estimate mainly depend on the nearest few points to the target location. This is especially the case when the spatial autocorrelation is strong (e.g. the nugget is relatively small compared to the total variance). This is particularly necessary if we have a big spatial dataset with large sample size. On the other hand, if the nugget size is large, then the kriging estimate becomes global.

v <- fit.variogram(variogram(log(zinc)~1,meuse), vgm("Sph"))
mean(log(meuse$zinc))
## [1] 5.885776
#Simple kriging with known mean=5.9
sk.fit <- krige(log(zinc)~1,meuse,meuse.grid,v,beta=5.9)
## [using simple kriging]
#ordinary kriging 
ok.fit <- krige(log(zinc)~1,meuse,meuse.grid,v)
## [using ordinary kriging]
#Universal kriging 
uk.fit <- krige(log(zinc)~sqrt(dist),meuse,meuse.grid,v)
## [using universal kriging]

For co-kriging, we need to have the multivariate variogram first. Let’s focus on four heavy metals: Cd, Cu, Pb and Zn.

g <- gstat(NULL,"logCd",log(cadmium)~1,meuse)
g <- gstat(g,"logCu",log(copper)~1,meuse)
g <- gstat(g,"logPb",log(lead)~1,meuse)
g <- gstat(g,"logZn",log(zinc)~1,meuse)
vm <- variogram(g)
vm.fit <- fit.lmc(vm, g, vgm("Sph"))
vm.fit
## data:
## logCd : formula = log(cadmium)`~`1 ; data dim = 155 x 14
## logCu : formula = log(copper)`~`1 ; data dim = 155 x 14
## logPb : formula = log(lead)`~`1 ; data dim = 155 x 14
## logZn : formula = log(zinc)`~`1 ; data dim = 155 x 14
## variograms:
##                model        psill    range
## logCd[1]         Nug  0.355746812   0.0000
## logCd[2]         Sph  1.114135729 514.4008
## logCu[1]         Nug  0.036437333   0.0000
## logCu[2]         Sph  0.223803442 514.4008
## logPb[1]         Nug  0.015270497   0.0000
## logPb[2]         Sph  0.445817679 514.4008
## logZn[1]         Nug  0.015168388   0.0000
## logZn[2]         Sph  0.530400664 514.4008
## logCd.logCu[1]   Nug  0.068358904   0.0000
## logCd.logCu[2]   Sph  0.447795786 514.4008
## logCd.logPb[1]   Nug -0.014291619   0.0000
## logCd.logPb[2]   Sph  0.676690873 514.4008
## logCu.logPb[1]   Nug -0.006712210   0.0000
## logCu.logPb[2]   Sph  0.303445099 514.4008
## logCd.logZn[1]   Nug  0.012934851   0.0000
## logCd.logZn[2]   Sph  0.746564601 514.4008
## logCu.logZn[1]   Nug -0.001198233   0.0000
## logCu.logZn[2]   Sph  0.333643044 514.4008
## logPb.logZn[1]   Nug -0.013961111   0.0000
## logPb.logZn[2]   Sph  0.484891440 514.4008
plot(vm,vm.fit)

Function fit.lmc fits a multivariate variogram (LMC stands for Linear Model of Coregionalization). Then we can run co-kriging using the fitted variogram. We can have an overview of the fitted error variance and covariance.

ck.maps <- predict(vm.fit, meuse.grid)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
spplot.vcov(ck.maps)

For block kriging, we need to specify the bloc size and shape.

#Ordinary block kriging for blocks of size 40mx40m
bk.fit1 <- krige(log(zinc)~1,meuse,meuse.grid,v,block=c(40,40))
## [using ordinary kriging]
#Circular shape with radius=20, centered on pts of meuse.grid
#one could select all pts on a regular grid within a circle
xy      <- expand.grid(x=seq(-20,20,4),y=seq(-20,20,4))
xy      <- xy[(xy$x^2+xy$y^2)<=20^2,]
bk.fit2 <- krige(log(zinc)~1,meuse,meuse.grid,v,block=xy)
## [using ordinary kriging]

For universal kriging, we will include the trend component into the model. Suppose we fit the trend as a linear regression, then the UK estimate can be expressed as: \[ \hat{Z}(s)=x(s)\hat{\beta}+v^\prime V^{-1}(Z(s)-X\hat{\beta}), \] where \(\hat{\beta}=(X^\prime V^{-1} X)^{-1}X^\prime V^{-1}Z(s)\) the weighted least square estimate of the trend coefficients and \(X^\prime\) is the transpose of the input matrix \(X\). The first part of UK estimate (i.e. \(x(s)\hat{\beta}\)) is an BLUE (Best Linear Unbiased Estimate) of the mean value at location \(s\). The second part (i.e. \(v^\prime V^{-1}(Z(s)-X\hat{\beta})\)) is the weighted mean of the residuals from the mean function with weights \(v^\prime V^{-1}\), known as the simple kriging weights.

#Universal kriging with trend component, which is a linear
#function of the square root of distance to river Meuse 
uk.fit <- gstat(formula=log(zinc)~sqrt(dist),data=meuse,model=v)
predict(uk.fit, meuse[1,]) #kriging estimate (BLUP)
## [using universal kriging]
##        coordinates var1.pred     var1.var
## 1 (181072, 333611)  6.929517 1.110223e-16
predict(uk.fit, meuse[1,],BLUE=T) #Least square estimate of trend (BLUE)
## [generalized least squares trend estimation]
##        coordinates var1.pred   var1.var
## 1 (181072, 333611)  6.862085 0.06123888

We see the kriging estimate (BLUP) has zero variance, while the second yields the generalized least square estimate (BLUE) for the trend component.

Validation of kriging models

In order to objectively evaluate the model’s prediction performance on some new data points, we usually split the dataset into calibration (training) and validation (test) sets. The calibration set is used to train the model and then predict the samples in the validation set.

set.seed(1) #set the random seed makes the results reproducible
indx  <- sample(1:155,size=100,replace=F)
calib <- meuse[indx,] #calibration (training) set
valid <- meuse[-indx,] #validation (test) set
v2    <- fit.variogram(variogram(log(zinc)~1,calib),vgm("Sph"))
ok.ft <- krige(log(zinc)~1,calib,valid,v2)
## [using ordinary kriging]
dim(ok.ft)
## [1] 55  2
ok.ft[1:5,]
##         coordinates var1.pred  var1.var
## 1  (181072, 333611)  6.735225 0.2177308
## 5  (181307, 333330)  5.480105 0.2727847
## 6  (181390, 333260)  5.352461 0.3363291
## 7  (181165, 333370)  5.861019 0.2261496
## 11 (181191, 333115)  5.326827 0.1957239
resid <- log(valid$zinc)-ok.ft$var1.pred
summary(resid)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.77970 -0.19110  0.00658  0.05241  0.26390  1.42800
#R-square on the test set
1-sum(resid^2)/sum((log(valid$zinc)-mean(log(valid$zinc)))^2)
## [1] 0.6227114

We can visually examine the residuals on the validation set using bubble plot. The color shows the sign of the residual and the symbol area is proportional to the absolute value of the residual.

ok.ft$resd <- resid
bubble(ok.ft,"resd")

The above approach will give you different results for a different split. It is more variable than using the \(k\)-fold cross-validation approach.

set.seed(1) #set the random seed makes the results reproducible
ok.cv <- krige.cv(log(zinc)~1,meuse,v,nfold=5)
bubble(ok.cv,"residual",main="log(zinc): 5-fold CV")

Note that the krige.cv function above and the gstat.cv function for the multivariable case DO NOT re-fit variograms for each fold. The variogram is usually fit on the complete dataset. The CV results are not completely independent from the modelling data, since the validation data also contribute to the variogram model.

In order to check any outlying residuals, we can use the standardized residuals (i.e. z-scores), which is included in the krige.cv output. \[ z_i=\frac{Z(s_i)-\hat{Z}_{[i]}(s_i)}{\sigma_{[i]}(s_i)}, \] where \(\hat{Z}_{[i]}(s_i)\) is the CV prediction for \(s_i\) and \(\sigma_{[i]}(s_i)\) is the corresponding kriging standard error. If the fitted variogram is correct, then the \(z_i\) is mean 0 and variance 1. If \(Z(s)\) is normally distributed, then \(z_i\) is standard normal (i.e. \(N(0,1)\)).

summary(ok.cv)
## Object of class SpatialPointsDataFrame
## Coordinates:
##      min    max
## x 178605 181390
## y 329714 333611
## Is projected: NA 
## proj4string : [NA]
## Number of points: 155
## Data attributes:
##    var1.pred        var1.var         observed        residual        
##  Min.   :4.895   Min.   :0.1179   Min.   :4.727   Min.   :-1.006365  
##  1st Qu.:5.396   1st Qu.:0.1642   1st Qu.:5.288   1st Qu.:-0.218940  
##  Median :5.886   Median :0.1888   Median :5.787   Median :-0.036317  
##  Mean   :5.896   Mean   :0.2009   Mean   :5.886   Mean   :-0.009827  
##  3rd Qu.:6.343   3rd Qu.:0.2290   3rd Qu.:6.514   3rd Qu.: 0.210041  
##  Max.   :7.238   Max.   :0.5443   Max.   :7.517   Max.   : 1.425695  
##      zscore              fold      
##  Min.   :-2.34541   Min.   :1.000  
##  1st Qu.:-0.50313   1st Qu.:2.000  
##  Median :-0.07797   Median :3.000  
##  Mean   :-0.01770   Mean   :3.052  
##  3rd Qu.: 0.50430   3rd Qu.:4.000  
##  Max.   : 3.16781   Max.   :5.000
bubble(ok.cv,"zscore",main="zscore: 5-fold CV")

Compare variogram models through kriging

Cross-validation provides a means of choosing among plausible models for variograms. Oliver and Webster (2014) suggested to use mean squared error (MSE) and mean squared deviation ratio (MSDR) to choose variogram models. Since kriging minimizes MSE, so the smaller MSE, the better the model is. MSDR is the mean of squared errors divided by the corresponding kriging variances, which should be close to 1. \[ MSDR=\frac{1}{n}\sum_{i}^n \frac{[z(s_i)-\hat{z}(s_i)]^2}{\hat{\sigma}_k^2(s_i)} \] Here we compare two variogram models, Spherical vs. Gaussian, using universal kriging. It seems using Spherical model is slightly better than the Gaussian model.

#Fit a Spherical variogram model
v.1 <- fit.variogram(variogram(log(zinc)~sqrt(dist),meuse), vgm("Sph"))
uk.loocv <- krige.cv(log(zinc)~sqrt(dist),meuse,v.1)
mean(uk.loocv$residual)         #mean error (should close to zero)
## [1] -0.002853345
sqrt(mean(uk.loocv$residual^2)) #RMSE (should be smaller)
## [1] 0.3752725
mean(uk.loocv$residual^2/as.data.frame(uk.loocv[,2])$var1.var) # MSDR (should close to 1)
## [1] 1.083495
#Fit a Gaussian variogram model
v.2 <- fit.variogram(variogram(log(zinc)~sqrt(dist),meuse), vgm("Gau"))
uk.loocv2 <- krige.cv(log(zinc)~sqrt(dist),meuse,v.2)
mean(uk.loocv2$residual)         #mean error (should close to zero)
## [1] -0.002257756
sqrt(mean(uk.loocv2$residual^2)) #RMSE (should be smaller)
## [1] 0.3761486
mean(uk.loocv2$residual^2/as.data.frame(uk.loocv2[,2])$var1.var) # MSDR (should close to 1)
## [1] 1.10786