The Meuse dataset we used here is included in sp package in R. This data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variables at the observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from composite samples of an area of approximately 15m \(\times\) 15m.
library(lattice)
library(sp)
## Warning: package 'sp' was built under R version 3.3.3
data(meuse)
coordinates(meuse) <- c("x", "y")
Here we focus on predicting the topsoil zinc concentration (\(mg\cdot kg^{-1}\)).
spplot(meuse, "zinc", do.log = T, colorkey = TRUE)
bubble(meuse, "zinc", do.log = T, key.space = "bottom")
We can predict the zinc concentration using non-geostatistical methods, such as Inverse Distance Weighted Interpolation (IDW): \[ \hat{Z}(s_0)=\frac{\sum_{i=1}^nw(s_i)Z(s_i)}{\sum_{i=1}^nw(s_i)}, \] where weights \(w(s_i)\) are computed according to their distance to the interpolation location \(s_0\), \[w(s_i)=||s_i-s_0||^{-p}\] with \(||\cdot||\) as Euclidean distance. The idp parameter in function idw is the inverse (Euclidean) distance weighting power \(p\). Large value of idp is similar to small value of k in k-nearest neighbors (KNN).
data(meuse.grid)
coordinates(meuse.grid) <- c("x","y")
meuse.grid <- as(meuse.grid,"SpatialPixelsDataFrame")
library(gstat)
## Warning: package 'gstat' was built under R version 3.3.3
idw.out <- idw(zinc~1,meuse,meuse.grid,idp=2)
## [inverse distance weighted interpolation]
as.data.frame(idw.out)[1:5,]
## x y var1.pred var1.var
## 1 181180 333740 633.6864 NA
## 2 181140 333700 712.5450 NA
## 3 181180 333700 654.1617 NA
## 4 181220 333700 604.4422 NA
## 5 181100 333660 857.2558 NA
In practice, IDW interpolation (with small \(p\)) often results in maps that are close to kriged maps when a variogram is isotropic with no or small nugget value (i.e. strong spatial autocorrrelation). Since IDW only use distance information, it ignores the spatial configuration of the samples (e.g. an anisotropic variogram). In addition, since the weights are always between 0 and 1, interpolated values never fall outside the range of data.
We can also predict the zinc concentration using linear regression models. One of the important predictors is the distance to the Meuse river. The figure below shows the relationship between the logarithm of topsoil zinc concentration and the distance to the Meuse river.
xyplot(log(zinc) ~ sqrt(dist), as.data.frame(meuse))
Even after removing the trend from variable dist, there is still some spatial autocorrelation left on the residuals (e.g. neighboring points tends to have similar residual values.)
zn.lm <- lm(log(zinc) ~ sqrt(dist), meuse)
meuse$fitted.s <- predict(zn.lm, meuse) - mean(predict(zn.lm,meuse))
meuse$residuals <- residuals(zn.lm)
spplot(meuse, c("fitted.s", "residuals"))
The spatial correlatin can be observed by looking at the lagged scatter plot. When lag is between 100 meters, the correlation is quite strong (i.e. correlation is 0.722). However, the spatial correlation becomes very close to 0 when lagged distance is beyond 500 meters.
hscat(log(zinc)~1,meuse,(0:9)*100)
Experimental variogram is shown below. The \(\sim1\) defines a single constant mean coefficient model: \(Z(s)=\mu+\epsilon(s)\), where \(s\) is the location.
plot(variogram(log(zinc)~1,meuse),type="l")
I randomly allocate the zinc concentration to different locations and see the experimental variogram, where this should be no spatial correlation. We see the variogram is flat.
dat <- meuse
n <- nrow(meuse)
set.seed(11)
ind <- sample(1:n,size=n,replace=F)
dat$zinc <- meuse$zinc[ind]
plot(variogram(log(zinc)~1,dat),type="l")
The variogram above uses the default settings for several parameters in the variogram. For example, it ignores the direction effects (i.e. isotropic). The data points are merged on the basis of distance, without considering their directions. We can add direction component into the variogram as follows. Note that 0 is north and 90 is east. As the sample size decreases for each direction, the sample variogram becomes more variable.
plot(variogram(log(zinc)~1,meuse,alpha=c(0,45,90,135)),type="l")
We injected some errors on 5 out of the 155 samples in the meuse data set and see the change of variogram and its robust estimator using Cressie-Hawkins estimator.
dat <- meuse
summary(meuse$zinc)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 113.0 198.0 326.0 469.7 674.5 1839.0
set.seed(1)
ind <- sample(1:155,size=5,replace=F)
dat$zinc[ind] <- meuse$zinc[ind]+5000
v.0 <- variogram(log(zinc)~1,meuse)
v.1 <- variogram(log(zinc)~1,dat)
v.2 <- variogram(log(zinc)~1,dat,cressie=T)
plot(v.0$dist,v.0$gamma,ylim=range(v.0$gamma,v.1$gamma,v.2$gamma),xlab="Distance",ylab="Semivariance",type="l")
lines(v.1$dist,v.1$gamma,lty=2,col="red",lwd=2)
lines(v.2$dist,v.2$gamma,lty=3,col="blue",lwd=2)
legend("bottomright",legend=c("Matheron: original","Matheron: corrupted","Cressie: corrupted"),lty=1:3,col=c(1,"red","blue"))
We can see the basic variogram available in gstat package.
show.vgms()
vgm()
## short long
## 1 Nug Nug (nugget)
## 2 Exp Exp (exponential)
## 3 Sph Sph (spherical)
## 4 Gau Gau (gaussian)
## 5 Exc Exclass (Exponential class/stable)
## 6 Mat Mat (Matern)
## 7 Ste Mat (Matern, M. Stein's parameterization)
## 8 Cir Cir (circular)
## 9 Lin Lin (linear)
## 10 Bes Bes (bessel)
## 11 Pen Pen (pentaspherical)
## 12 Per Per (periodic)
## 13 Wav Wav (wave)
## 14 Hol Hol (hole)
## 15 Log Log (logarithmic)
## 16 Pow Pow (power)
## 17 Spl Spl (spline)
## 18 Leg Leg (Legendre)
## 19 Err Err (Measurement error)
## 20 Int Int (Intercept)
We can fit a sample variogram for a particular model or even a set of variogram models.
v <- variogram(log(zinc)~1,meuse)
fit.variogram(v, vgm("Sph"))
## model psill range
## 1 Nug 0.05065971 0.0000
## 2 Sph 0.59060511 897.0011
fit.variogram(v, vgm(c("Exp", "Sph")))
## model psill range
## 1 Nug 0.05065971 0.0000
## 2 Sph 0.59060511 897.0011
We see the spherical model with nugget=0.051, sill=0.591 and range=897 is chosen. We can plot the sample variogram and fitted model together.
v2 <- vgm(0.591,"Sph",897,0.051)
plot(v,v2)
Since distance to the Meuse rive (variable dist) is an important factor for zinc concentration, we may consider to model the spatial correlation after removing the trends from dist. The following model fits: \[ \log(Z(s))=\beta_0+\beta_1\sqrt{dist}+\epsilon(s) \]
v3 <- variogram(log(zinc)~sqrt(dist),meuse)
fit.variogram(v3,vgm(c("Exp", "Sph"))) #adjusted by distance effect
## model psill range
## 1 Nug 0.07981792 0.0000
## 2 Sph 0.14905756 872.7236
fit.variogram(v, vgm(c("Exp", "Sph"))) #without adjustment
## model psill range
## 1 Nug 0.05065971 0.0000
## 2 Sph 0.59060511 897.0011
After adjusting the distance effect, we see the sill size is substantially reduced, while the range doesn’t change much.
v4 <- vgm(0.15,"Sph",873,0.08)
plot(v3,v4)
Kriging is used to estimate the values at those locations which have not been sampled. There are several types of krigings, but the following five are commonly used.
The basic technique ordinary kriging (OK) uses a weighted average of neighboring samples to estimate the unknown value at a given location. The random process model for OK is: \[Z(s)=\mu+\epsilon(s)\] with a constant (but unknown) mean \(\mu\) and a variogram \(\gamma(h)\). It is known that OK estimate is BLUP (best linear unbiased prediction). Linear means the estimates are weighted linear combinations of the available data. Unbiased means it tries to have the mean residual equal to 0. Best means it minimizes the error variance, which is the distinguishing feature for kriging.
The simple kriging (SK) is almost the same as OK, except the mean \(\mu\) is known in advance.
The kriging with a trend is usually called universal kriging (UK), where the underlying random process model is: \[Z(s)=f(s)+\epsilon(s)\] with the trend function \(f(s)\) (also called external drift.
Cokriging extends the kriging prediction to multivariate situation (i.e. \(Z\) represents a vector with more than one variables).
By default, \(Z(s)\) is treated as being observed on point location. However, block kriging predicts the averages of larger areas or volumnes instead of a single point.
Like many local estimators in statistics and machine learning (e.g. Loess smoother and \(k\)-nearest neighbors), Kriging estimate mainly depend on the nearest few points to the target location. This is especially the case when the spatial autocorrelation is strong (e.g. the nugget is relatively small compared to the total variance). This is particularly necessary if we have a big spatial dataset with large sample size. On the other hand, if the nugget size is large, then the kriging estimate becomes global.
v <- fit.variogram(variogram(log(zinc)~1,meuse), vgm("Sph"))
mean(log(meuse$zinc))
## [1] 5.885776
#Simple kriging with known mean=5.9
sk.fit <- krige(log(zinc)~1,meuse,meuse.grid,v,beta=5.9)
## [using simple kriging]
#ordinary kriging
ok.fit <- krige(log(zinc)~1,meuse,meuse.grid,v)
## [using ordinary kriging]
#Universal kriging
uk.fit <- krige(log(zinc)~sqrt(dist),meuse,meuse.grid,v)
## [using universal kriging]
For co-kriging, we need to have the multivariate variogram first. Let’s focus on four heavy metals: Cd, Cu, Pb and Zn.
g <- gstat(NULL,"logCd",log(cadmium)~1,meuse)
g <- gstat(g,"logCu",log(copper)~1,meuse)
g <- gstat(g,"logPb",log(lead)~1,meuse)
g <- gstat(g,"logZn",log(zinc)~1,meuse)
vm <- variogram(g)
vm.fit <- fit.lmc(vm, g, vgm("Sph"))
vm.fit
## data:
## logCd : formula = log(cadmium)`~`1 ; data dim = 155 x 14
## logCu : formula = log(copper)`~`1 ; data dim = 155 x 14
## logPb : formula = log(lead)`~`1 ; data dim = 155 x 14
## logZn : formula = log(zinc)`~`1 ; data dim = 155 x 14
## variograms:
## model psill range
## logCd[1] Nug 0.355746812 0.0000
## logCd[2] Sph 1.114135729 514.4008
## logCu[1] Nug 0.036437333 0.0000
## logCu[2] Sph 0.223803442 514.4008
## logPb[1] Nug 0.015270497 0.0000
## logPb[2] Sph 0.445817679 514.4008
## logZn[1] Nug 0.015168388 0.0000
## logZn[2] Sph 0.530400664 514.4008
## logCd.logCu[1] Nug 0.068358904 0.0000
## logCd.logCu[2] Sph 0.447795786 514.4008
## logCd.logPb[1] Nug -0.014291619 0.0000
## logCd.logPb[2] Sph 0.676690873 514.4008
## logCu.logPb[1] Nug -0.006712210 0.0000
## logCu.logPb[2] Sph 0.303445099 514.4008
## logCd.logZn[1] Nug 0.012934851 0.0000
## logCd.logZn[2] Sph 0.746564601 514.4008
## logCu.logZn[1] Nug -0.001198233 0.0000
## logCu.logZn[2] Sph 0.333643044 514.4008
## logPb.logZn[1] Nug -0.013961111 0.0000
## logPb.logZn[2] Sph 0.484891440 514.4008
plot(vm,vm.fit)
Function fit.lmc fits a multivariate variogram (LMC stands for Linear Model of Coregionalization). Then we can run co-kriging using the fitted variogram. We can have an overview of the fitted error variance and covariance.
ck.maps <- predict(vm.fit, meuse.grid)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
spplot.vcov(ck.maps)
For block kriging, we need to specify the bloc size and shape.
#Ordinary block kriging for blocks of size 40mx40m
bk.fit1 <- krige(log(zinc)~1,meuse,meuse.grid,v,block=c(40,40))
## [using ordinary kriging]
#Circular shape with radius=20, centered on pts of meuse.grid
#one could select all pts on a regular grid within a circle
xy <- expand.grid(x=seq(-20,20,4),y=seq(-20,20,4))
xy <- xy[(xy$x^2+xy$y^2)<=20^2,]
bk.fit2 <- krige(log(zinc)~1,meuse,meuse.grid,v,block=xy)
## [using ordinary kriging]
For universal kriging, we will include the trend component into the model. Suppose we fit the trend as a linear regression, then the UK estimate can be expressed as: \[ \hat{Z}(s)=x(s)\hat{\beta}+v^\prime V^{-1}(Z(s)-X\hat{\beta}), \] where \(\hat{\beta}=(X^\prime V^{-1} X)^{-1}X^\prime V^{-1}Z(s)\) the weighted least square estimate of the trend coefficients and \(X^\prime\) is the transpose of the input matrix \(X\). The first part of UK estimate (i.e. \(x(s)\hat{\beta}\)) is an BLUE (Best Linear Unbiased Estimate) of the mean value at location \(s\). The second part (i.e. \(v^\prime V^{-1}(Z(s)-X\hat{\beta})\)) is the weighted mean of the residuals from the mean function with weights \(v^\prime V^{-1}\), known as the simple kriging weights.
#Universal kriging with trend component, which is a linear
#function of the square root of distance to river Meuse
uk.fit <- gstat(formula=log(zinc)~sqrt(dist),data=meuse,model=v)
predict(uk.fit, meuse[1,]) #kriging estimate (BLUP)
## [using universal kriging]
## coordinates var1.pred var1.var
## 1 (181072, 333611) 6.929517 1.110223e-16
predict(uk.fit, meuse[1,],BLUE=T) #Least square estimate of trend (BLUE)
## [generalized least squares trend estimation]
## coordinates var1.pred var1.var
## 1 (181072, 333611) 6.862085 0.06123888
We see the kriging estimate (BLUP) has zero variance, while the second yields the generalized least square estimate (BLUE) for the trend component.
In order to objectively evaluate the model’s prediction performance on some new data points, we usually split the dataset into calibration (training) and validation (test) sets. The calibration set is used to train the model and then predict the samples in the validation set.
set.seed(1) #set the random seed makes the results reproducible
indx <- sample(1:155,size=100,replace=F)
calib <- meuse[indx,] #calibration (training) set
valid <- meuse[-indx,] #validation (test) set
v2 <- fit.variogram(variogram(log(zinc)~1,calib),vgm("Sph"))
ok.ft <- krige(log(zinc)~1,calib,valid,v2)
## [using ordinary kriging]
dim(ok.ft)
## [1] 55 2
ok.ft[1:5,]
## coordinates var1.pred var1.var
## 1 (181072, 333611) 6.735225 0.2177308
## 5 (181307, 333330) 5.480105 0.2727847
## 6 (181390, 333260) 5.352461 0.3363291
## 7 (181165, 333370) 5.861019 0.2261496
## 11 (181191, 333115) 5.326827 0.1957239
resid <- log(valid$zinc)-ok.ft$var1.pred
summary(resid)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.77970 -0.19110 0.00658 0.05241 0.26390 1.42800
#R-square on the test set
1-sum(resid^2)/sum((log(valid$zinc)-mean(log(valid$zinc)))^2)
## [1] 0.6227114
We can visually examine the residuals on the validation set using bubble plot. The color shows the sign of the residual and the symbol area is proportional to the absolute value of the residual.
ok.ft$resd <- resid
bubble(ok.ft,"resd")
The above approach will give you different results for a different split. It is more variable than using the \(k\)-fold cross-validation approach.
set.seed(1) #set the random seed makes the results reproducible
ok.cv <- krige.cv(log(zinc)~1,meuse,v,nfold=5)
bubble(ok.cv,"residual",main="log(zinc): 5-fold CV")
Note that the krige.cv function above and the gstat.cv function for the multivariable case DO NOT re-fit variograms for each fold. The variogram is usually fit on the complete dataset. The CV results are not completely independent from the modelling data, since the validation data also contribute to the variogram model.
In order to check any outlying residuals, we can use the standardized residuals (i.e. z-scores), which is included in the krige.cv output. \[ z_i=\frac{Z(s_i)-\hat{Z}_{[i]}(s_i)}{\sigma_{[i]}(s_i)}, \] where \(\hat{Z}_{[i]}(s_i)\) is the CV prediction for \(s_i\) and \(\sigma_{[i]}(s_i)\) is the corresponding kriging standard error. If the fitted variogram is correct, then the \(z_i\) is mean 0 and variance 1. If \(Z(s)\) is normally distributed, then \(z_i\) is standard normal (i.e. \(N(0,1)\)).
summary(ok.cv)
## Object of class SpatialPointsDataFrame
## Coordinates:
## min max
## x 178605 181390
## y 329714 333611
## Is projected: NA
## proj4string : [NA]
## Number of points: 155
## Data attributes:
## var1.pred var1.var observed residual
## Min. :4.895 Min. :0.1179 Min. :4.727 Min. :-1.006365
## 1st Qu.:5.396 1st Qu.:0.1642 1st Qu.:5.288 1st Qu.:-0.218940
## Median :5.886 Median :0.1888 Median :5.787 Median :-0.036317
## Mean :5.896 Mean :0.2009 Mean :5.886 Mean :-0.009827
## 3rd Qu.:6.343 3rd Qu.:0.2290 3rd Qu.:6.514 3rd Qu.: 0.210041
## Max. :7.238 Max. :0.5443 Max. :7.517 Max. : 1.425695
## zscore fold
## Min. :-2.34541 Min. :1.000
## 1st Qu.:-0.50313 1st Qu.:2.000
## Median :-0.07797 Median :3.000
## Mean :-0.01770 Mean :3.052
## 3rd Qu.: 0.50430 3rd Qu.:4.000
## Max. : 3.16781 Max. :5.000
bubble(ok.cv,"zscore",main="zscore: 5-fold CV")
Cross-validation provides a means of choosing among plausible models for variograms. Oliver and Webster (2014) suggested to use mean squared error (MSE) and mean squared deviation ratio (MSDR) to choose variogram models. Since kriging minimizes MSE, so the smaller MSE, the better the model is. MSDR is the mean of squared errors divided by the corresponding kriging variances, which should be close to 1. \[ MSDR=\frac{1}{n}\sum_{i}^n \frac{[z(s_i)-\hat{z}(s_i)]^2}{\hat{\sigma}_k^2(s_i)} \] Here we compare two variogram models, Spherical vs. Gaussian, using universal kriging. It seems using Spherical model is slightly better than the Gaussian model.
#Fit a Spherical variogram model
v.1 <- fit.variogram(variogram(log(zinc)~sqrt(dist),meuse), vgm("Sph"))
uk.loocv <- krige.cv(log(zinc)~sqrt(dist),meuse,v.1)
mean(uk.loocv$residual) #mean error (should close to zero)
## [1] -0.002853345
sqrt(mean(uk.loocv$residual^2)) #RMSE (should be smaller)
## [1] 0.3752725
mean(uk.loocv$residual^2/as.data.frame(uk.loocv[,2])$var1.var) # MSDR (should close to 1)
## [1] 1.083495
#Fit a Gaussian variogram model
v.2 <- fit.variogram(variogram(log(zinc)~sqrt(dist),meuse), vgm("Gau"))
uk.loocv2 <- krige.cv(log(zinc)~sqrt(dist),meuse,v.2)
mean(uk.loocv2$residual) #mean error (should close to zero)
## [1] -0.002257756
sqrt(mean(uk.loocv2$residual^2)) #RMSE (should be smaller)
## [1] 0.3761486
mean(uk.loocv2$residual^2/as.data.frame(uk.loocv2[,2])$var1.var) # MSDR (should close to 1)
## [1] 1.10786