Package 'laGP' reference manual

Title:	Local Approximate Gaussian Process Regression
Description:	Performs approximate GP regression for large computer experiments and spatial datasets. The approximation is based on finding small local designs for prediction (independently) at particular inputs. OpenMP and SNOW parallelization are supported for prediction over a vast out-of-sample testing set; GPU acceleration is also supported for an important subroutine. OpenMP and GPU features may require special compilation. An interface to lower-level (full) GP inference and prediction is provided. Wrapper routines for blackbox optimization under mixed equality and inequality constraints via an augmented Lagrangian scheme, and for large scale computer model calibration, are also provided. For details and tutorial, see Gramacy (2016 <doi:10.18637/jss.v072.i01>.
Authors:	Robert B. Gramacy <[email protected]>, Furong Sun <[email protected]>
Maintainer:	Robert B. Gramacy <[email protected]>
License:	LGPL
Version:	1.5-9
Built:	2025-01-20 06:48:07 UTC
Source:	https://github.com/cran/laGP

Localized Approximate GP Regression For Many Predictive Locations

Description

Facilitates localized Gaussian process inference and prediction at a large set of predictive locations, by (essentially) calling laGP at each location, and returning the moments of the predictive equations, and indices into the design, thus obtained

Usage

aGP(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE,
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), 
    numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, 
    omp.threads = if (num.gpus > 0) 0 else 1, 
    nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1)
aGP.parallel(cls, XX, chunks = length(cls), X, Z, start = 6, end = 50, 
    d = NULL, g = 1/10000, method = c("alc", "alcray", "mspe", "nn", "fish"), 
    Xi.ret = TRUE, 
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, 
    omp.threads = if (num.gpus > 0) 0 else 1,
    nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1)
aGP.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE,
    close = min((1000+end) *if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X), laGP=laGP.R, verb = 1)
aGPsep(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "nn"), Xi.ret = TRUE,
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X),  omp.threads = 1, verb = 1)
aGPsep.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "nn"), Xi.ret = TRUE,
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X), laGPsep=laGPsep.R, verb = 1)
aGP.seq(X, Z, XX, d, methods=rep("alc", 2), M=NULL, ncalib=0, ...)
aGP(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE,
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), 
    numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, 
    omp.threads = if (num.gpus > 0) 0 else 1, 
    nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1)
aGP.parallel(cls, XX, chunks = length(cls), X, Z, start = 6, end = 50, 
    d = NULL, g = 1/10000, method = c("alc", "alcray", "mspe", "nn", "fish"), 
    Xi.ret = TRUE, 
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, 
    omp.threads = if (num.gpus > 0) 0 else 1,
    nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1)
aGP.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE,
    close = min((1000+end) *if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X), laGP=laGP.R, verb = 1)
aGPsep(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "nn"), Xi.ret = TRUE,
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X),  omp.threads = 1, verb = 1)
aGPsep.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "nn"), Xi.ret = TRUE,
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X), laGPsep=laGPsep.R, verb = 1)
aGP.seq(X, Z, XX, d, methods=rep("alc", 2), M=NULL, ncalib=0, ...)

Arguments

`X`	a `matrix` or `data.frame` containing the full (large) design matrix of input locations
`Z`	a vector of responses/dependent values with `length(Z) = nrow(X)`
`XX`	a `matrix` or `data.frame` of out-of-sample predictive locations with `ncol(XX) = ncol(X)`; `aGP` calls `laGP` for each row of `XX` as a value of `Xref`, independently
`start`	the number of Nearest Neighbor (NN) locations to start each independent call to `laGP` with; must have `start >= 6`
`end`	the total size of the local designs; `start < end`
`d`	a prior or initial setting for the lengthscale parameter in a Gaussian correlation function; a (default) `NULL` value triggers a sensible regularization (prior) and initial setting to be generated via `darg`; a scalar specifies an initial value, causing `darg` to only generate the prior; otherwise, a list or partial list matching the output of `darg` can be used to specify a custom prior. In the case of a partial list, only the missing entries will be generated. Note that a default/generated list specifies MLE/MAP inference for this parameter. When specifying initial values, a vector of length `nrow(XX)` can be provided, giving a different initial value for each predictive location. With `aGPsep`, the starting values can be an `ncol(X)`-by-`nrow(XX)` `matrix` or an `ncol(X)` vector
`g`	a prior or initial setting for the nugget parameter; a `NULL` value causes a sensible regularization (prior) and initial setting to be generated via `garg`; a scalar (default `g = 1/10000`) specifies an initial value, causing `garg` to only generate the prior; otherwise, a list or partial list matching the output of `garg` can be used to specify a custom prior. In the case of a partial list, only the missing entries will be generated. Note that a default/generated list specifies no inference for this parameter; i.e., it is fixed at its starting or default value, which may be appropriate for emulating deterministic computer code output. In such situations a value much smaller than the default may work even better (i.e., yield better out-of-sample predictive performance). The default was chosen conservatively. When specifying non-default initial values, a vector of length `nrow(XX)` can be provided, giving a different initial value for each predictive location
`method`	specifies the method by which `end-start` candidates from `X` are chosen in order to predict at each row `XX` independently. In brief, ALC (`"alc"`, default) minimizes predictive variance; ALCRAY (`"alcray")` executes a thrifty search focused on rays emanating from the reference location(s); MSPE (`"mspe"`) augments ALC with extra derivative information to minimize mean-squared prediction error (requires extra computation); NN (`"nn"`) uses nearest neighbor; and (`"fish"`) uses the expected Fisher information - essentially `1/G` from Gramacy & Apley (2015) - which is global heuristic, i.e., not localized to each row of `XX`
`methods`	for `aGP.seq` this is a vectorized `method` argument, containing a list of valid methods to perform in sequence. When `methods = FALSE` a call to `M` is invoked instead; see below for more details
`Xi.ret`	a scalar logical indicating whether or not a `matrix` of indices into `X`, describing the chosen sub-design for each of the predictive locations in `XX`, should be returned on output
`close`	a non-negative integer `end < close <= nrow(X)` specifying the number of NNs (to each row `XX`) in `X` to consider when searching for the sub-design; `close = 0` specifies all. For `method="alcray"` this specifies the scope used to snap ray-based solutions back to elements of `X`, otherwise there are no restrictions on that search
`numrays`	a scalar integer indicating the number of rays for each greedy search; only relevant when `method="alcray"`. More rays leads to a more thorough, but more computationally intensive search
`laGP`	applicable only to the R-version `aGP.R`, this is a function providing the local design implementation to be used. Either `laGP` or `laGP.R` can be provided, or a bespoke routine providing similar outputs
`laGPsep`	applicable only to the R-version `aGPsep.R`, this is a function providing the local design implementation to be used. Either `laGPsep` or `laGPsep.R` can be provided, or a bespoke routine providing similar outputs
`num.gpus`	applicable only to the C-version `aGP`, this is a scalar positive integer indicating the number of GPUs available for calculating ALC (see `alcGP`); the package must be compiled for CUDA support; see README/INSTALL in the package source for more details
`gpu.threads`	applicable only to the C-version `aGP`; this is a scalar positive integer indicating the number of SMP (i.e., CPU) threads queuing ALC jobs on a GPU; the package must be compiled for CUDA support. If `gpu.threads >= 2` then the package must also be compiled for OpenMP support; see README/INSTALL in the package source for more details. We recommend setting `gpu.threads` to up to two-times the sum of the number of GPU devices and CPU cores. Only `method = "alc"` is supported when using CUDA. If the sum of `omp.threads` and `gpu.threads` is bigger than the max allowed by your system, then that max is used instead (giving `gpu.threads` preference)
`omp.threads`	applicable only to the C-version `aGP`; this is a scalar positive integer indicating the number of threads to use for SMP parallel processing; the package must be compiled for OpenMP support; see README/INSTALL in the package source for more details. For most Intel-based machines, we recommend setting `omp.threads` to up to two-times the number of hyperthreaded cores. When using GPUs (`num.gpu > 0`), a good default is `omp.threads=0`, otherwise load balancing could be required; see `nn.gpu` below. If the sum of `omp.threads` and `gpu.threads` is bigger than the max allowed by your system, then that max is used instead (giving `gpu.threads` preference)
`nn.gpu`	a scalar non-negative integer between `0` and `nrow(XX)` indicating the number of predictive locations utilizing GPU ALC calculations. Note this argument is only useful when both `gpu.threads` and `omp.threads` are non-zero, whereby it acts as a load balancing mechanism
`verb`	a non-negative integer specifying the verbosity level; `verb = 0` is quiet, and larger values cause more progress information to be printed to the screen. The value `min(0,verb-1)` is provided to each `laGP` call
`cls`	a cluster object created by `makeCluster` from the parallel or snow packages
`chunks`	a scalar integer indicating the number of chunks to break `XX` into for parallel evaluation on cluster `cls`. Usually `chunks = length(cl)` is appropriate. However specifying more chunks can be useful when the nodes of the cluster are not homogeneous
`M`	an optional function taking two matrix inputs, of `ncol(X)-ncalib` and `ncalib` columns respectively, which is applied in lieu of `aGP`. This can be useful for calibration where the computer model is cheap, i.e., does not require emulation; more details below
`ncalib`	an integer between 1 and `ncol(X)` indicating how to partition `X` and `XX` inputs into the two matrices required for `M`
`...`	other arguments passed from `aGP.sep` to `aGP`

Details

This function invokes laGP with argument Xref = XX[i,] for each i=1:nrow(XX), building up local designs, inferring local correlation parameters, and obtaining predictive locations independently for each location. For more details see laGP.

The function aGP.R is a prototype R-only version for debugging and transparency purposes. It is slower than aGP, which is primarily in C. However it may be useful for developing new programs that involve similar subroutines. Note that aGP.R may provide different output than aGP due to differing library subroutines deployed in R and C.

The function aGP.parallel allows aGP to be called on segments of the XX matrix distributed to a cluster created by parallel. It breaks XX into chunks which are sent to aGP workers pointed to by the entries of cls. The aGP.parallel function collects the outputs from each chunk before returning an object almost identical to what would have been returned from a single aGP call. On a single (SMP) node, this represents is a poor-man's version of the OpenMP version described below. On multiple nodes both can be used.

If compiled with OpenMP flags, the independent calls to laGP will be farmed out to threads allowing them to proceed in parallel - obtaining nearly linear speed-ups. At this time aGP.R does not facilitate parallel computation, although a future version may exploit the parallel functionality for clustered parallel execution.

If num.gpus > 0 then the ALC part of the independent calculations performed by each thread will be offloaded to a GPU. If both gpu.threads >= 1 and omp.threads >= 1, some of the ALC calculations will be done on the GPUs, and some on the CPUs. In our own experimentation we have not found this to lead to large speedups relative to omp.threads = 0 when using GPUs. For more details, see Gramacy, Niemi, & Weiss (2014).

The aGP.sep function is provided primarily for use in calibration exercises, see Gramacy, et al. (2015). It automates a sequence of aGP calls, each with a potentially different method, successively feeding the previous estimate of local lengthscale (d) in as an initial set of values for the next call. It also allows the use of aGP to be bypassed, feeding the inputs into a user-supplied M function instead. This feature is enabled when methods = FALSE. The M function takes two matrices (same number of rows) as inputs, where the first ncol(X) - ncalib columns represent “field data” inputs shared by the physical and computer model (in the calibration context), and the remaining ncalib are the extra tuning or calibration parameters required to evalue the computer model. For examples illustrating aGP.seq please see the documentation file for discrep.est and demo("calib")

Value

The output is a list with the following components.

`mean`	a vector of predictive means of length `nrow(XX)`
`var`	a vector of predictive variances of length `nrow(Xref)`
`llik`	a vector indicating the log likelihood/posterior probability of the data/parameter(s) under the chosen sub-design for each predictive location in `XX`; provided up to an additive constant
`time`	a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation
`method`	a copy of the `method` argument
`d`	a full-list version of the `d` argument, possibly completed by `darg`
`g`	a full-list version of the `g` argument, possibly completed by `garg`
`mle`	if `d$mle` and/or `g$mle` are `TRUE`, then `mle` is a `data.frame` containing the values found for these parameters, and the number of required iterations, for each predictive location in `XX`
`Xi`	when `Xi.ret = TRUE`, this field contains a `matrix` of indices of length `end` into `X` indicating the sub-design chosen for each predictive location in `XX`
`close`	a copy of the input argument

The aGP.seq function only returns the output from the final aGP call. When methods = FALSE and M is supplied, the returned object is a data frame with a mean column indicating the output of the computer model run, and a var column, which at this time is zero

Note

aGPsep provides the same functionality as aGP but deploys a separable covariance function. Criteria (methods) EFI and MSPE are not supported by aGPsep at this time.

Note that using method="NN" gives the same result as specifying start=end, however at some extra computational expense.

At this time, this function provides no facility to find local designs for the subset of predictive locations XX jointly, i.e., providing a matrix Xref to laGP. See laGP for more details/support for this alternative.

The use of OpenMP threads with aGPsep is not as efficient as with aGP when calculating MLEs with respect to the lengthscale (i.e., d=list(mle=TRUE, ...)). The reason is that the lbfgsb C entry point uses static variables, and is therefore not thread safe. To circumvent this problem, an OpenMP critical pragma is used, which can create a small bottle neck

Author(s)

Robert B. Gramacy [email protected]

References

Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 9.) https://bobby.gramacy.com/surrogates/

R.B. Gramacy (2016). laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R., Journal of Statistical Software, 72(1), 1-46; doi:10.18637/jss.v072.i01 or see vignette("laGP")

R.B. Gramacy and D.W. Apley (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2), pp. 561-678; preprint on arXiv:1303.0383; https://arxiv.org/abs/1303.0383

R.B. Gramacy, J. Niemi, R.M. Weiss (2014). Massively parallel approximate Gaussian process regression. SIAM/ASA Journal on Uncertainty Quantification, 2(1), pp. 568-584; preprint on arXiv:1310.5182; https://arxiv.org/abs/1310.5182

R.B. Gramacy and B. Haaland (2016). Speeding up neighborhood search in local Gaussian process prediction. Technometrics, 58(3), pp. 294-303; preprint on arXiv:1409.0074 https://arxiv.org/abs/1409.0074

Examples

## first, a "computer experiment"

## Simple 2-d test function used in Gramacy & Apley (2014);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)){
      if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- expand.grid(x, x)
Z <- f2d(X)

## predictive grid with NN=400 locations,
## change NN to 10K (length=100) to mimic setup in Gramacy & Apley (2014)
## the low NN set here is for fast CRAN checks
xx <- seq(-1.975, 1.975, length=10)
XX <- expand.grid(xx, xx)
ZZ <- f2d(XX)

## get the predictive equations, first based on Nearest Neighbor
out <- aGP(X, Z, XX, method="nn", verb=0)
## RMSE
sqrt(mean((out$mean - ZZ)^2))

## Not run: 
## refine with ALC
out2 <- aGP(X, Z, XX, method="alc", d=out$mle$d)
## RMSE
sqrt(mean((out2$mean - ZZ)^2))

## visualize the results
par(mfrow=c(1,3))
image(xx, xx, matrix(out2$mean, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="predictive mean")
image(xx, xx, matrix(out2$mean-ZZ, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="bias")
image(xx, xx, matrix(sqrt(out2$var), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="sd")

## refine with MSPE
out3 <- aGP(X, Z, XX, method="mspe", d=out2$mle$d)
## RMSE
sqrt(mean((out3$mean - ZZ)^2))

## End(Not run)

## version with ALC-ray which is much faster than the ones not
## run above
out2r <- aGP(X, Z, XX, method="alcray", d=out$mle$d, verb=0)
sqrt(mean((out2r$mean - ZZ)^2))

## a simple example with estimated nugget
if(require("MASS")) {

  ## motorcycle data and predictive locations
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]
  XX <- matrix(seq(min(X), max(X), length=100), ncol=1)

  ## first stage
  out <- aGP(X=X, Z=Z, XX=XX, end=30, g=list(mle=TRUE), verb=0) 
  
  ## plot smoothed versions of the estimated parameters
  par(mfrow=c(2,1))
  df <- data.frame(y=log(out$mle$d), XX)
  lo <- loess(y~., data=df, span=0.25)
  plot(XX, log(out$mle$d), type="l")
  lines(XX, lo$fitted, col=2)
  dfnug <- data.frame(y=log(out$mle$g), XX)
  lonug <- loess(y~., data=dfnug, span=0.25)
  plot(XX, log(out$mle$g), type="l")
  lines(XX, lonug$fitted, col=2)

  ## second stage design
  out2 <- aGP(X=X, Z=Z, XX=XX, end=30, verb=0,
		  d=list(start=exp(lo$fitted), mle=FALSE),
		  g=list(start=exp(lonug$fitted)))
  
  ## plot the estimated surface
  par(mfrow=c(1,1))
  plot(X,Z)
  df <- 20
  s2 <- out2$var*(df-2)/df
  q1 <- qt(0.05, df)*sqrt(s2) + out2$mean
  q2 <- qt(0.95, df)*sqrt(s2) + out2$mean
  lines(XX, out2$mean)
  lines(XX, q1, col=1, lty=2)
  lines(XX, q2, col=1, lty=2)
}

## compare to the single-GP result provided in the mleGP documentation
## first, a "computer experiment"

## Simple 2-d test function used in Gramacy & Apley (2014);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)){
      if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- expand.grid(x, x)
Z <- f2d(X)

## predictive grid with NN=400 locations,
## change NN to 10K (length=100) to mimic setup in Gramacy & Apley (2014)
## the low NN set here is for fast CRAN checks
xx <- seq(-1.975, 1.975, length=10)
XX <- expand.grid(xx, xx)
ZZ <- f2d(XX)

## get the predictive equations, first based on Nearest Neighbor
out <- aGP(X, Z, XX, method="nn", verb=0)
## RMSE
sqrt(mean((out$mean - ZZ)^2))

## Not run: 
## refine with ALC
out2 <- aGP(X, Z, XX, method="alc", d=out$mle$d)
## RMSE
sqrt(mean((out2$mean - ZZ)^2))

## visualize the results
par(mfrow=c(1,3))
image(xx, xx, matrix(out2$mean, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="predictive mean")
image(xx, xx, matrix(out2$mean-ZZ, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="bias")
image(xx, xx, matrix(sqrt(out2$var), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="sd")

## refine with MSPE
out3 <- aGP(X, Z, XX, method="mspe", d=out2$mle$d)
## RMSE
sqrt(mean((out3$mean - ZZ)^2))

## End(Not run)

## version with ALC-ray which is much faster than the ones not
## run above
out2r <- aGP(X, Z, XX, method="alcray", d=out$mle$d, verb=0)
sqrt(mean((out2r$mean - ZZ)^2))

## a simple example with estimated nugget
if(require("MASS")) {

  ## motorcycle data and predictive locations
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]
  XX <- matrix(seq(min(X), max(X), length=100), ncol=1)

  ## first stage
  out <- aGP(X=X, Z=Z, XX=XX, end=30, g=list(mle=TRUE), verb=0) 
  
  ## plot smoothed versions of the estimated parameters
  par(mfrow=c(2,1))
  df <- data.frame(y=log(out$mle$d), XX)
  lo <- loess(y~., data=df, span=0.25)
  plot(XX, log(out$mle$d), type="l")
  lines(XX, lo$fitted, col=2)
  dfnug <- data.frame(y=log(out$mle$g), XX)
  lonug <- loess(y~., data=dfnug, span=0.25)
  plot(XX, log(out$mle$g), type="l")
  lines(XX, lonug$fitted, col=2)

  ## second stage design
  out2 <- aGP(X=X, Z=Z, XX=XX, end=30, verb=0,
		  d=list(start=exp(lo$fitted), mle=FALSE),
		  g=list(start=exp(lonug$fitted)))
  
  ## plot the estimated surface
  par(mfrow=c(1,1))
  plot(X,Z)
  df <- 20
  s2 <- out2$var*(df-2)/df
  q1 <- qt(0.05, df)*sqrt(s2) + out2$mean
  q2 <- qt(0.95, df)*sqrt(s2) + out2$mean
  lines(XX, out2$mean)
  lines(XX, q1, col=1, lty=2)
  lines(XX, q2, col=1, lty=2)
}

## compare to the single-GP result provided in the mleGP documentation

Improvement statistics for sequential or local design

Description

Calculate the active learning Cohn (ALC) statistic, mean-squared predictive error (MSPE) or expected Fisher information (fish) for a Gaussian process (GP) predictor relative to a set of reference locations, towards sequential design or local search for Gaussian process regression

Usage

alcGP(gpi, Xcand, Xref = Xcand, parallel = c("none", "omp", "gpu"), 
      verb = 0)
alcGPsep(gpsepi, Xcand, Xref = Xcand, parallel = c("none", "omp", "gpu"), 
      verb = 0)
alcrayGP(gpi, Xref, Xstart, Xend, verb = 0)
alcrayGPsep(gpsepi, Xref, Xstart, Xend, verb = 0)
ieciGP(gpi, Xcand, fmin, Xref = Xcand, w = NULL, nonug = FALSE, verb = 0)
ieciGPsep(gpsepi, Xcand, fmin, Xref = Xcand, w = NULL, nonug = FALSE, verb = 0)
mspeGP(gpi, Xcand, Xref = Xcand, fi = TRUE, verb = 0)
fishGP(gpi, Xcand)
alcoptGP(gpi, Xref, start, lower, upper, maxit = 100, verb = 0)
alcoptGPsep(gpsepi, Xref, start, lower, upper, maxit = 100, verb = 0)
dalcGP(gpi, Xcand, Xref = Xcand, verb = 0)
dalcGPsep(gpsepi, Xcand, Xref = Xcand, verb = 0)
alcGP(gpi, Xcand, Xref = Xcand, parallel = c("none", "omp", "gpu"), 
      verb = 0)
alcGPsep(gpsepi, Xcand, Xref = Xcand, parallel = c("none", "omp", "gpu"), 
      verb = 0)
alcrayGP(gpi, Xref, Xstart, Xend, verb = 0)
alcrayGPsep(gpsepi, Xref, Xstart, Xend, verb = 0)
ieciGP(gpi, Xcand, fmin, Xref = Xcand, w = NULL, nonug = FALSE, verb = 0)
ieciGPsep(gpsepi, Xcand, fmin, Xref = Xcand, w = NULL, nonug = FALSE, verb = 0)
mspeGP(gpi, Xcand, Xref = Xcand, fi = TRUE, verb = 0)
fishGP(gpi, Xcand)
alcoptGP(gpi, Xref, start, lower, upper, maxit = 100, verb = 0)
alcoptGPsep(gpsepi, Xref, start, lower, upper, maxit = 100, verb = 0)
dalcGP(gpi, Xcand, Xref = Xcand, verb = 0)
dalcGPsep(gpsepi, Xcand, Xref = Xcand, verb = 0)

Arguments

`gpi`	a C-side GP object identifier (positive integer); e.g., as returned by `newGP`
`gpsepi`	a C-side separable GP object identifier (positive integer); e.g., as returned by `newGPsep`
`Xcand`	a `matrix` or `data.frame` containing a design of candidate predictive locations at which the ALC (or other) criteria is (are) evaluated. In the context of `laGP`, these are the possible locations for adding into the current local design
`fmin`	for `ieci*` only: a scalar value indicating the value of the best minimum found so far. This is usually set to the minimum of the `Z`-values stored in the `gpi` or `gpsepi` reference (for deterministic/low nugget settings), or otherwise the predicted mean value at the `X` locations
`Xref`	a `matrix` or `data.frame` containing a design of reference locations for ALC or MSPE. I.e., these are the locations at which the reduction in variance, or mean squared predictive error, are calculated. In the context of `laGP`, this is the single location, or set of reference locations, around which a local design (for accurate prediction) is sought. For `alcrayGP` and `alcrayGPsep` the `matrix` may only have one row, i.e., one reference location
`parallel`	a switch indicating if any parallel calculation of the criteria (`method`) is desired. For `parallel = "omp"`, the package must be compiled with OpenMP flags; for `parallel = "gpu"`, the package must be compiled with CUDA flags (only the ALC criteria is supported on the GPU); see README/INSTALL in the package source for more details
`Xstart`	a `1`-by-`ncol(Xref)` starting location for a search along a ray between `Xstart` and `Xend`
`Xend`	a `1`-by-`ncol(Xref)` ending location for a search along a ray between `Xstart` and `Xend`
`fi`	a scalar logical indicating if the expected Fisher information portion of the expression (MSPE is essentially `ALC + c(x)*EFI`) should be calculated (`TRUE`) or set to zero (`FALSE`). This flag is mostly for error checking against the other functions, `alcGP` and `fishGP`, since the constituent parts are separately available via those functions
`w`	weights on the reference locations `Xref` for IECI calculations; IECI, which stands for Integrated Expected Conditional Improvement, is not fully documented at this time. See Gramacy & Lee (2010) for more details.
`nonug`	a scalar logical indicating if a (nonzero) nugget should be used in the predictive equations behind IECI calculations; this allows the user to toggle improvement via predictive mean uncertainty versus full predictive uncertainty. The latter (default `nonug = FALSE`) is the standard approach, but the former may work better (citation forthcoming)
`verb`	a non-negative integer specifying the verbosity level; `verb = 0` is quiet, and larger values cause more progress information to be printed to the screen
`start`	initial values to the derivative-based search via `"L-BFGS-B"` within `alcoptGP` and `alcoptGPsep`; a nearest neighbor often represents a sensible initialization
`lower`, `upper`	bounds on the derivative-based search via `"L-BFGS-B"` within `alcoptGP` and `alcoptGPsep`
`maxit`	the maximum number of iterations (default `maxit=100`) in `"L-BFGS-B"` search within `alcoptGP` and `alcoptGPsep`

Details

The best way to see how these functions are used in the context of local approximation is to inspect the code in the laGP.R function.

Otherwise they are pretty self-explanatory. They evaluate the ALC, MSPE, and EFI quantities outlined in Gramacy & Apley (2015). ALC is originally due to Seo, et al. (2000). The ray-based search is described by Gramacy & Haaland (2015).

MSPE and EFI calculations are not supported for separable GP models, i.e., there are no mspeGPsep or fishGPsep functions.

alcrayGP and alcrayGPsep allow only one reference location (nrow(Xref) = 1). alcoptGP and alcoptGPsep allow multiple reference locations. These optimize a continuous ALC analog in its natural logarithm using the starting locations, bounding boxes and (stored) GP provided by gpi or gpisep, and finally snaps the solution back to the candidate grid. For details, see Sun, et al. (2017).

Note that ieciGP and ieciGPsep, which are for optimization via integrated expected conditional improvement (Gramacy & Lee, 2011) are “alpha” functionality and are not fully documented at this time.

Value

Except for alcoptGP, alcoptGPsep, dalcGP, and dalcGPsep, a vector of length nrow(Xcand) is returned filled with values corresponding to the desired statistic

`par`	the best set of parameters/input configuration found on optimization
`value`	the optimized objective value corresponding to output `par`
`its`	a two-element integer vector giving the number of calls to the object function and the gradient respectively.
`msg`	a character string giving any additional information returned by the optimizer, or `NULL`
`convergence`	An integer code. `0` indicates successful completion. For the other error codes, see the documentation for `optim`
`alcs`	reduced predictive variance averaged over the reference locations
`dalcs`	the derivative of `alcs` with respect to the new location

Author(s)

Robert B. Gramacy [email protected] and Furong Sun [email protected]

References

F. Sun, R.B. Gramacy, B. Haaland, E. Lawrence, and A. Walker (2019). Emulating satellite drag from large simulation experiments, SIAM/ASA Journal on Uncertainty Quantification, 7(2), pp. 720-759; preprint on arXiv:1712.00182; https://arxiv.org/abs/1712.00182

R.B. Gramacy (2016). laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R, Journal of Statistical Software, 72(1), 1-46; doi:10.18637/jss.v072.i01 or see vignette("laGP")

R.B. Gramacy and B. Haaland (2016). Speeding up neighborhood search in local Gaussian process prediction, Technometrics, 58(3), pp. 294-303; preprint on arXiv:1409.0074; https://arxiv.org/abs/1409.0074

R.B. Gramacy and D.W. Apley (2015). Local Gaussian process approximation for large computer experiments, Journal of Computational and Graphical Statistics, 24(2), pp. 561-678; preprint on arXiv:1303.0383; https://arxiv.org/abs/1303.0383

R.B. Gramacy, J. Niemi, R.M. Weiss (2014). Massively parallel approximate Gaussian process regression, SIAM/ASA Journal on Uncertainty Quantification, 2(1), pp. 568-584; preprint on arXiv:1310.5182; https://arxiv.org/abs/1310.5182

R.B. Gramacy, H.K.H. Lee (2011). Optimization under unknown constraints, Valencia discussion paper, in Bayesian Statistics 9. Oxford University Press; preprint on arXiv:1004.4027; https://arxiv.org/abs/1004.4027

S. Seo, M., Wallat, T. Graepel, K. Obermayer (2000). Gaussian Process Regression: Active Data Selection and Test Point Rejection, In Proceedings of the International Joint Conference on Neural Networks, vol. III, 241-246. IEEE

Examples

## this follows the example in predGP, but only evaluates 
## information statistics documented here

## Simple 2-d test function used in Gramacy & Apley (2015);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)) {
      if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## design with N=441
x <- seq(-2, 2, length=11)
X <- expand.grid(x, x)
Z <- f2d(X)

## fit a GP
gpi <- newGP(X, Z, d=0.35, g=1/1000, dK=TRUE)

## predictive grid with NN=400
xx <- seq(-1.9, 1.9, length=20)
XX <- expand.grid(xx, xx)

## predict
alc <- alcGP(gpi, XX)
mspe <- mspeGP(gpi, XX)
fish <- fishGP(gpi, XX)

## visualize the result
par(mfrow=c(1,3))
image(xx, xx, matrix(sqrt(alc), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="sqrt ALC")
image(xx, xx, matrix(sqrt(mspe), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="sqrt MSPE")
image(xx, xx, matrix(log(fish), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="log fish")

## clean up
deleteGP(gpi)


## 
## Illustrating some of the other functions in a sequential design context, 
## using X and XX above
## 

## new, much bigger design
x <- seq(-2, 2, by=0.02)
X <- expand.grid(x, x)
Z <- f2d(X)

## first build a local design of size 25, see laGP documentation
out <- laGP.R(XX, start=6, end=25, X, Z, method="alc", close=10000)

## extract that design and fit GP
XC <- X[out$Xi,] ## inputs
ZC <- Z[out$Xi]  ## outputs
gpi <- newGP(XC, ZC, d=out$mle$d, g=out$g$start)

## calculate the ideal "next" location via continuous ALC optimization
alco <- alcoptGP(gpi=gpi, Xref=XX, start=c(0,0), lower=range(x)[1], upper=range(x)[2])

## alco$par is the "new" location; calculate distances between candidates (remaining
## unchosen X locations) and this solution
Xcan <- X[-out$Xi,]
D <- distance(Xcan, matrix(alco$par, ncol=ncol(Xcan))) 

## snap the new location back to the candidate set
lab <- which.min(D) 
xnew <- Xcan[lab,] 
## add xnew to the local design, remove it from Xcan, and repeat

## evaluate the derivative at this new location
dalc <- dalcGP(gpi=gpi, Xcand=matrix(xnew, nrow=1), Xref=XX)

## clean up
deleteGP(gpi)
## this follows the example in predGP, but only evaluates 
## information statistics documented here

## Simple 2-d test function used in Gramacy & Apley (2015);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)) {
      if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## design with N=441
x <- seq(-2, 2, length=11)
X <- expand.grid(x, x)
Z <- f2d(X)

## fit a GP
gpi <- newGP(X, Z, d=0.35, g=1/1000, dK=TRUE)

## predictive grid with NN=400
xx <- seq(-1.9, 1.9, length=20)
XX <- expand.grid(xx, xx)

## predict
alc <- alcGP(gpi, XX)
mspe <- mspeGP(gpi, XX)
fish <- fishGP(gpi, XX)

## visualize the result
par(mfrow=c(1,3))
image(xx, xx, matrix(sqrt(alc), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="sqrt ALC")
image(xx, xx, matrix(sqrt(mspe), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="sqrt MSPE")
image(xx, xx, matrix(log(fish), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="log fish")

## clean up
deleteGP(gpi)


## 
## Illustrating some of the other functions in a sequential design context, 
## using X and XX above
## 

## new, much bigger design
x <- seq(-2, 2, by=0.02)
X <- expand.grid(x, x)
Z <- f2d(X)

## first build a local design of size 25, see laGP documentation
out <- laGP.R(XX, start=6, end=25, X, Z, method="alc", close=10000)

## extract that design and fit GP
XC <- X[out$Xi,] ## inputs
ZC <- Z[out$Xi]  ## outputs
gpi <- newGP(XC, ZC, d=out$mle$d, g=out$g$start)

## calculate the ideal "next" location via continuous ALC optimization
alco <- alcoptGP(gpi=gpi, Xref=XX, start=c(0,0), lower=range(x)[1], upper=range(x)[2])

## alco$par is the "new" location; calculate distances between candidates (remaining
## unchosen X locations) and this solution
Xcan <- X[-out$Xi,]
D <- distance(Xcan, matrix(alco$par, ncol=ncol(Xcan))) 

## snap the new location back to the candidate set
lab <- which.min(D) 
xnew <- Xcan[lab,] 
## add xnew to the local design, remove it from Xcan, and repeat

## evaluate the derivative at this new location
dalc <- dalcGP(gpi=gpi, Xcand=matrix(xnew, nrow=1), Xref=XX)

## clean up
deleteGP(gpi)

Bootstrapped block Latin hypercube subsampling

Description

Provides bootstrapped block Latin hypercube subsampling under a given data set to aid in consistent estimation of a global separable lengthscale parameter

Usage

  blhs(y, X, m)
  blhs.loop(y, X, m, K, da, g = 1e-3, maxit = 100, verb = 0, plot.it = FALSE)
blhs(y, X, m)
  blhs.loop(y, X, m, K, da, g = 1e-3, maxit = 100, verb = 0, plot.it = FALSE)

Arguments

`y`	a vector of responses/dependent values with `length(y) = nrow(X)`
`X`	a `matrix` or `data.frame` containing the full (large) design matrix of input locations
`m`	a positive scalar integer giving the number of divisions on each coordinate of input space defining the block structure
`K`	a positive scalar integer specifying the number of Bootstrap replicates desired
`da`	a lengthscale prior, say as generated by `darg`
`g`	a positive scalar giving the fixed nugget value of the nugget parameter; by default `g = 1e-3`
`maxit`	a positive scalar integer giving the maximum number of iterations for MLE calculations via `"L-BFGS-B"`; see `mleGPsep` for more details
`verb`	a non-negative integer specifying the verbosity level; `verb = 0` (by default) is quiet, and larger values cause more progress information to be printed to the screen
`plot.it`	`plot.it = FALSE` by default; if `plot.it = TRUE`, then each of the `K` lengthscale estimates from bootstrap iterations will be shown via `boxplot`

Details

Bootstrapped block Latin hypercube subsampling (BLHS) yields a global lengthscale estimator which is asymptotically consistent with the MLE calculated on the full data set. However, since it works on data subsets, it comes at a much reduced computational cost. Intuitively, the BLHS guarantees a good mix of short and long pairwise distances. A single bootstrap LH subsample may be obtained by dividing each dimension of the input space equally into m intervals, yielding $m^d$ mutually exclusive hypercubes. It is easy to show that the average number of observations in each hypercube is $Nm^{-d}$ if there are $N$ samples in the original design. From each of these hypercubes, m blocks are randomly selected following the LH paradigm, i.e., so that only one interval is chosen from each of the m segments. The average number of observations in the subsample, combining the m randomly selected blocks, is $Nm^{-d+1}$ .

Ensuring a subsample size of at least one requires having $m\leq N^{\frac{1}{d-1}}$ , thereby linking the parameter m to computational effort. Smaller m is preferred so long as GP inference on data of that size remains tractable. Since the blocks follow an LH structure, the resulting sub-design inherits the usual LHS properties, e.g., retaining marginal properties like univariate stratification modulo features present in the original, large N, design.

For more details, see Liu (2014), Zhao, et al. (2017) and Sun, et al. (2019).

blhs returns the subsampled input space and the corresponding responses.

blhs.loop returns the median of the K lengthscale maximum likelihood estimates, the subsampled data size to which that corresponds, and the subsampled data, including the input space and the responses, from the bootstrap iterations

Value

blhs returns

`xs`	the subsampled input space
`ys`	the subsampled responses, `length(ys) = nrow(xs)`

blhs.loop returns

`that`	the lengthscale estimate (median), `length(that) = ncol(X)`
`ly`	the subsampled data size (median)
`xm`	the subsampled input space (median)
`ym`	the subsampled responses (median)

Note

This implementation assums that X has been coded to the unit cube ( $[0,1]^p$ ), where p = ncol(X).

X should be relatively homogeneous. A space-filling design (input) X is ideal, but not required

Author(s)

Robert B. Gramacy [email protected] and Furong Sun [email protected]

References

Y. Zhao, Y. Hung, and Y. Amemiya (2017). Efficient Gaussian Process Modeling using Experimental Design-Based Subagging, Statistica Sinica, to appear;

Yufan Liu (2014) Recent Advances in Computer Experiment Modeling. Ph.D. Thesis at Rutgers, The State University of New Jersey. https://dx.doi.org/doi:10.7282/T38G8J1H

Examples

  # input space based on latin-hypercube sampling (not required)
  # two dimensional example with N=216 sized sample
  if(require(lhs)) { X <- randomLHS(216, 2)  
  } else { X <- matrix(runif(216*2), ncol=2) }
  # pseudo responses, not important for visualizing design
  Y <- runif(216) 
  
  ## BLHS sample with m=6 divisions in each coordinate
  sub <- blhs(y=Y, X=X, m=6)
  Xsub <- sub$xs # the bootstrapped subsample
  
  # visualization
  plot(X, xaxt="n", yaxt="n", xlim=c(0,1), ylim=c(0,1), xlab="factor 1", 
    ylab="factor 2", col="cyan", main="BLHS")
  b <- seq(0, 1, by=1/6)
  abline(h=b, v=b, col="black", lty=2)
  axis(1, at=seq (0, 1, by=1/6), cex.axis=0.8, 
    labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1))
  axis(2, at=seq (0, 1, by=1/6), cex.axis=0.8, 
    labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1), las=1)
  points(Xsub, col="red", pch=19, cex=1.25)
  
  ## Comparing global lengthscale MLE based on BLHS and random subsampling
  ## Not run: 
    # famous borehole function
    borehole <- function(x){
      rw <- x[1] * (0.15 - 0.05) + 0.05
      r <-  x[2] * (50000 - 100) + 100
      Tu <- x[3] * (115600 - 63070) + 63070
      Tl <- x[5] * (116 - 63.1) + 63.1
      Hu <- x[4] * (1110 - 990) + 990
      Hl <- x[6] * (820 - 700) + 700
      L <-  x[7] * (1680 - 1120) + 1120
      Kw <- x[8] * (12045 - 9855) + 9855
      m1 <- 2 * pi * Tu * (Hu - Hl)
      m2 <- log(r / rw)
      m3 <- 1 + 2*L*Tu/(m2*rw^2*Kw) + Tu/Tl
      return(m1/m2/m3)
    }
    
    N <- 100000                   # number of observations
    if(require(lhs)) { xt <- randomLHS(N, 8)   # input space
    } else { xt <- matrix(runif(N*8), ncol=8) }
    yt <- apply(xt, 1, borehole)  # response
    colnames(xt) <- c("rw", "r", "Tu", "Tl", "Hu", "Hl", "L", "Kw")

    ## prior on the GP lengthscale parameter
    da <- darg(list(mle=TRUE, max=100), xt)

    ## make space for two sets of boxplots
    par(mfrow=c(1,2))
    
    # BLHS calculating with visualization of the K MLE lengthscale estimates
    K <- 10  # number of Bootstrap samples; Sun, et al (2017) uses K <- 31
    sub_blhs <- blhs.loop(y=yt, X=xt, K=K, m=2, da=da, maxit=200, plot.it=TRUE)
  
    # a random subsampling analog for comparison
    sn <- sub_blhs$ly # extract a size that is consistent with the BLHS
    that.rand <- matrix(NA, ncol=8, nrow=K)
    for(i in 1:K){
      sub <- sample(1:nrow(xt), sn)
      gpsepi <- newGPsep(xt[sub,], yt[sub], d=da$start, g=1e-3, dK=TRUE)
      mle <- mleGPsep(gpsepi, tmin=da$min, tmax=10*da$max, ab=da$ab, maxit=200)
      deleteGPsep(gpsepi)
      that.rand[i,] <- mle$d
    }

    ## put random boxplots next to BLHS ones
    boxplot(that.rand, xlab="input", ylab="theta-hat", col=2, 
      main="random subsampling")
  
## End(Not run)
# input space based on latin-hypercube sampling (not required)
  # two dimensional example with N=216 sized sample
  if(require(lhs)) { X <- randomLHS(216, 2)  
  } else { X <- matrix(runif(216*2), ncol=2) }
  # pseudo responses, not important for visualizing design
  Y <- runif(216) 
  
  ## BLHS sample with m=6 divisions in each coordinate
  sub <- blhs(y=Y, X=X, m=6)
  Xsub <- sub$xs # the bootstrapped subsample
  
  # visualization
  plot(X, xaxt="n", yaxt="n", xlim=c(0,1), ylim=c(0,1), xlab="factor 1", 
    ylab="factor 2", col="cyan", main="BLHS")
  b <- seq(0, 1, by=1/6)
  abline(h=b, v=b, col="black", lty=2)
  axis(1, at=seq (0, 1, by=1/6), cex.axis=0.8, 
    labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1))
  axis(2, at=seq (0, 1, by=1/6), cex.axis=0.8, 
    labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1), las=1)
  points(Xsub, col="red", pch=19, cex=1.25)
  
  ## Comparing global lengthscale MLE based on BLHS and random subsampling
  ## Not run: 
    # famous borehole function
    borehole <- function(x){
      rw <- x[1] * (0.15 - 0.05) + 0.05
      r <-  x[2] * (50000 - 100) + 100
      Tu <- x[3] * (115600 - 63070) + 63070
      Tl <- x[5] * (116 - 63.1) + 63.1
      Hu <- x[4] * (1110 - 990) + 990
      Hl <- x[6] * (820 - 700) + 700
      L <-  x[7] * (1680 - 1120) + 1120
      Kw <- x[8] * (12045 - 9855) + 9855
      m1 <- 2 * pi * Tu * (Hu - Hl)
      m2 <- log(r / rw)
      m3 <- 1 + 2*L*Tu/(m2*rw^2*Kw) + Tu/Tl
      return(m1/m2/m3)
    }
    
    N <- 100000                   # number of observations
    if(require(lhs)) { xt <- randomLHS(N, 8)   # input space
    } else { xt <- matrix(runif(N*8), ncol=8) }
    yt <- apply(xt, 1, borehole)  # response
    colnames(xt) <- c("rw", "r", "Tu", "Tl", "Hu", "Hl", "L", "Kw")

    ## prior on the GP lengthscale parameter
    da <- darg(list(mle=TRUE, max=100), xt)

    ## make space for two sets of boxplots
    par(mfrow=c(1,2))
    
    # BLHS calculating with visualization of the K MLE lengthscale estimates
    K <- 10  # number of Bootstrap samples; Sun, et al (2017) uses K <- 31
    sub_blhs <- blhs.loop(y=yt, X=xt, K=K, m=2, da=da, maxit=200, plot.it=TRUE)
  
    # a random subsampling analog for comparison
    sn <- sub_blhs$ly # extract a size that is consistent with the BLHS
    that.rand <- matrix(NA, ncol=8, nrow=K)
    for(i in 1:K){
      sub <- sample(1:nrow(xt), sn)
      gpsepi <- newGPsep(xt[sub,], yt[sub], d=da$start, g=1e-3, dK=TRUE)
      mle <- mleGPsep(gpsepi, tmin=da$min, tmax=10*da$max, ab=da$ab, maxit=200)
      deleteGPsep(gpsepi)
      that.rand[i,] <- mle$d
    }

    ## put random boxplots next to BLHS ones
    boxplot(that.rand, xlab="input", ylab="theta-hat", col=2, 
      main="random subsampling")
  
## End(Not run)

Generate Priors for GP correlation

Description

Generate empirical Bayes regularization (priors) and choose initial values and ranges for (isotropic) lengthscale and nugget parameters to a Gaussian correlation function for a GP regression model

Usage

darg(d, X, samp.size = 1000)
garg(g, y)
darg(d, X, samp.size = 1000)
garg(g, y)

Arguments

`d`	can be `NULL`, or a scalar indicating an initial value or a partial `list` whose format matches the one described in the Value section below
`g`	can be `NULL`, or a scalar indicating an initial value or a partial `list` whose format matches the one described in the Value section below
`X`	a `matrix` or `data.frame` containing the full (large) design matrix of input locations
`y`	a vector of responses/dependent values
`samp.size`	a scalar integer indicating a subset size of `X` to use for calculations; this is important for very large `X` matrices since the calculations are quadratic in `nrow(X)`

Details

These functions use aspects of the data, either X or y, to form weakly informative default priors and choose initial values for a lengthscale and nugget parameter. This is useful since the likelihood can sometimes be very flat, and even with proper priors inference can be sensitive to the specification of those priors and any initial search values. The focus here is on avoiding pathologies while otherwise remaining true to the spirit of MLE calculation.

darg output specifies MLE inference (out$mle = TRUE) by default, whereas garg instead fixes the nugget at the starting value, which may be sensible for emulating deterministic computer simulation data; when out$mle = FALSE the calculated range outputs c(out$min, out$max) are set to dummy values that are ignored in other parts of the laGP package.

darg calculates a Gaussian distance matrix between all pairs of X rows, or a subsample of rows of size samp.size. From those distances it chooses the range and start values from the range of (non-zero) distances and the 0.1 quantile, respectively. The Gamma prior values have a shape of out$a = 3/2 and a rate out$b chosen by the incomplete Gamma inverse function to put 0.95 probability below out$max.

garg is similar except that it works with (y- mean(y))^2 instead of the pairwise distances of darg. The only difference is that the starting value is chosen as the 2.5% quantile.

Value

Both functions return a list containing the following entries. If the input object (d or g) specifies one of the values then that value is copied to the same list entry on output. See the Details section for how these values are calculated

`mle`	by default, `TRUE` for `darg` and `FALSE` for `garg`
`start`	starting value chosen from the quantiles of `distance(X)` or `(y - mean(y))^2`
`min`	minimum value in allowable range for the parameter - for future inference purposes
`max`	maximum value in allowable range for the parameter - for future inference purposes
`ab`	shape and rate parameters specifying a Gamma prior for the parameter

Author(s)

Robert B. Gramacy [email protected]

Examples

## motorcycle data
if(require("MASS")) {
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]

  ## get darg and garg
  darg(NULL, X)
  garg(list(mle=TRUE), Z)
}
## motorcycle data
if(require("MASS")) {
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]

  ## get darg and garg
  darg(NULL, X)
  garg(list(mle=TRUE), Z)
}

Delete C-side Gaussian Process Objects

Description

Frees memory allocated by a particular C-side Gaussian process object, or all GP objects currently allocated

Usage

deleteGP(gpi)
deleteGPsep(gpsepi)
deleteGPs()
deleteGPseps()
deleteGP(gpi)
deleteGPsep(gpsepi)
deleteGPs()
deleteGPseps()

Arguments

`gpi`	a scalar positive integer specifying an allocated isotropic GP object
`gpsepi`	similar to `gpi` but indicating a separable GP object

Details

Any function calling newGP or newGPsep will require destruction via these functions or there will be a memory leak

Value

Nothing is returned

Author(s)

Robert B. Gramacy [email protected]

Examples

## see examples for newGP, predGP, or mleGP
## see examples for newGP, predGP, or mleGP

Estimate Discrepancy in Calibration Model

Description

Estimates the Gaussian process discrepancy/bias and/or noise term in a modularized calibration of a computer model (emulator) to field data, and returns the log likelihood or posterior probability

Usage

discrep.est(X, Y, Yhat, d, g, bias = TRUE, clean = TRUE)
discrep.est(X, Y, Yhat, d, g, bias = TRUE, clean = TRUE)

Arguments

`X`	a `matrix` or `data.frame` containing a design matrix of input locations for field data sites. Any columns of `X` without at least three unique input settings are dropped in a pre-processing step
`Y`	a vector of values with `length(Y) = ncol(X)` containing the response from field data observations at `X`. A `Y`-vector with `length(Y) = k*ncol(X)`, for positive integer `k`, can be supplied in which case the multiple code `Y`-values will be treated as replicates at the `X`-values
`Yhat`	a vector with `length(Yhat) = length(Y)` containing predictions at `X` from an emulator of a computer simulation
`d`	a prior or initial setting for the (single/isotropic) lengthscale parameter in a Gaussian correlation function; a (default) `NULL` value triggers a sensible regularization (prior) and initial setting to be generated via `darg`; a scalar specifies an initial value, causing `darg` to only generate the prior; otherwise, a list or partial list matching the output of `darg` can be used to specify a custom prior. In the case of a partial list, the only the missing entries will be generated. Note that a default/generated list specifies MLE/MAP inference for this parameter. When specifying initial values, a vector of length `nrow(XX)` can be provided, giving a different initial value for each predictive location.
`g`	a prior or initial setting for the nugget parameter; a `NULL` value causes a sensible regularization (prior) and initial setting to be generated via `garg`; a scalar (default `g = 1/1000`) specifies an initial value, causing `garg` to only generate the prior; otherwise, a list or partial list matching the output of `garg` can be used to specify a custom prior. In the case of a partial list, only the missing entries will be generated. Note that a default/generated list specifies no inference for this parameter; i.e., it is fixed at its starting value, which may be appropriate for emulating deterministic computer code output
`bias`	a scalar logical indicating if a (isotropic) GP discrepancy should be estimated (`TRUE`) or a Gaussian noise term only (`FALSE`)
`clean`	a scalar logical indicating if the C-side GP object should be freed before returning.

Details

Estimates an isotropic Gaussian correlation Gaussian process (GP) discrepancy term for the difference between a computer model output (Yhat) and field data observations (Y) at locations X. The computer model predictions would typically come from a GP emulation from simulation data, possibly via aGP if the computer experiment is large.

This function is used primarily as a subroutine by fcalib which defines an objective function for optimization in order to solve the calibration problem via the method described by Gramacy, et al. (2015), designed for large computer experiments. However, once calibration is performed this function can be useful for making comparisons to other methods. Examples are provided in the fcalib documentation.

When bias=FALSE no discrepancy is estimated; only a zero-mean Gaussian error distribution is assumed

Value

The output object is comprised of the output of jmleGP, applied to a GP object built with responses Y - Yhat. That object is augmented with a log likelihood, in $ll, and with a GP index $gpi when clean=FALSE. When bias = FALSE the output object retains the same form as above, except with dummy zero-values since calling jmleGP is not required

Note

Note that in principle a separable correlation function could be used (e.g, via newGPsep and mleGPsep), however this is not implemented at this time

Author(s)

Robert B. Gramacy [email protected]

References

Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 8.) https://bobby.gramacy.com/surrogates/

R.B. Gramacy, D. Bingham, JP. Holloway, M.J. Grosskopf, C.C. Kuranz, E. Rutter, M. Trantham, P.R. Drake (2015). Calibrating a large computer experiment simulating radiative shock hydrodynamics. Annals of Applied Statistics, 9(3) 1141-1168; preprint on arXiv:1410.3293 https://arxiv.org/abs/1410.3293

F. Liu, M. Bayarri and J. Berger (2009). Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Analysis, 4(1) 119-150.

Examples

## the example here combines aGP.seq and discrep.est functions; 
## it is comprised of snippets from demo("calib"), which contains
## code from the Calibration Section of vignette("laGP")

## Here we generate calibration data using a true calibration
## parameter, u, and then evaluate log posterior probabilities
## and out-of-sample RMSEs for that u value;  the fcalib 
## documentation repeats this with a single call to fcalib rather 
## than first aGP.seq and then discrep.est

## begin data-generation code identical to aGP.seq, discrep.est, fcalib
## example sections and demo("calib")

## M: computer model test functon used in Goh et al, 2013 (Technometrics)
## an elaboration of one from Bastos and O'Hagan, 2009 (Technometrics) 
M <- function(x,u) 
  {
    x <- as.matrix(x)
    u <- as.matrix(u)
    out <- (1-exp(-1/(2*x[,2]))) 
    out <- out * (1000*u[,1]*x[,1]^3+1900*x[,1]^2+2092*x[,1]+60) 
    out <- out / (100*u[,2]*x[,1]^3+500*x[,1]^2+4*x[,1]+20)  
    return(out)
  }
  
## bias: discrepancy function from Goh et al, 2013 
bias <- function(x) 
  {
    x<-as.matrix(x)   
    out<- 2*(10*x[,1]^2+4*x[,2]^2) / (50*x[,1]*x[,2]+10)
    return(out)
  }

## beta.prior: marginal beta prior for u, 
## defaults to a mode at 1/2 in hypercube
beta.prior <- function(u, a=2, b=2, log=TRUE)
{
  if(length(a) == 1) a <- rep(a, length(u))
  else if(length(a) != length(u)) stop("length(a) must be 1 or length(u)")
  if(length(b) == 1) b <- rep(b, length(u))
  else if(length(b) != length(u)) stop("length(b) must be 1 or length(u)")
  if(log) return(sum(dbeta(u, a, b, log=TRUE)))
  else return(prod(dbeta(u, a, b, log=FALSE)))
}

## tgp for LHS sampling
library(tgp)
rect <- matrix(rep(0:1, 4), ncol=2, byrow=2)

## training and testing inputs
ny <- 50; nny <- 1000  
X <- lhs(ny, rect[1:2,])    ## computer model train
XX <- lhs(nny, rect[1:2,],) ## test

## true (but unknown) setting of the calibration parameter
## for the computer model
u <- c(0.2, 0.1)
Zu <- M(X, matrix(u, nrow=1)) 
ZZu <- M(XX, matrix(u, nrow=1)) 

## field data response, biased and replicated
sd <- 0.5
## Y <- computer output + bias + noise
reps <- 2 ## example from paper uses reps <- 10
Y <- rep(Zu,reps) + rep(bias(X),reps) + rnorm(reps*length(Zu), sd=sd) 
YYtrue <- ZZu + bias(XX) 
## variations: remove the bias or change 2 to 1 to remove replicates

## computer model design
nz <- 10000
XT <- lhs(nz, rect)
nth <- 1 ## number of threads to use in emulation, demo uses 8

## augment with physical model design points 
## with various u settings
XT2 <- matrix(NA, nrow=10*ny, ncol=4)
for(i in 1:10) {
  I <- ((i-1)*ny+1):(ny*i)
  XT2[I,1:2] <- X
}
XT2[,3:4] <- lhs(10*ny, rect[3:4,])
XT <- rbind(XT, XT2)

## evaluate the computer model
Z <- M(XT[,1:2], XT[,3:4])

## flag indicating if estimating bias/discrepancy or not
bias.est <- TRUE
## two passes of ALC with MLE calculations for aGP.seq
methods <- rep("alcray", 2) ## demo uses rep("alc", 2)

## set up priors
da <- d <- darg(NULL, XT)
g <- garg(list(mle=TRUE), Y) 

## end identical data generation code

## now calculate log posterior and do out-of-sample RMSE calculation
## for true calibration parameter value (u).  You could repeat
## this for an estimate value from demo("calib"), for example
## u.hat <- c(0.8236673, 0.1406989)

## first log posterior

## emulate at field-data locations Xu
Xu <- cbind(X, matrix(rep(u, ny), ncol=2, byrow=TRUE))
ehat.u <- aGP.seq(XT, Z, Xu, da, methods, ncalib=2, omp.threads=nth, verb=0)

## estimate discrepancy from the residual
cmle.u <- discrep.est(X, Y, ehat.u$mean, d, g, bias.est, FALSE)
cmle.u$ll <- cmle.u$ll + beta.prior(u)
print(cmle.u$ll)
## compare to same calculation with u.hat above

## now RMSE
## Not run: 
## predictive design with true calibration parameter
XXu <- cbind(XX, matrix(rep(u, nny), ncol=2, byrow=TRUE))

## emulate at predictive design
ehat.oos.u <- aGP.seq(XT, Z, XXu, da, methods, ncalib=2, 
  omp.threads=nth, verb=0)

## predict via discrepency
YYm.pred.u <- predGP(cmle.u$gp, XX)

## add in emulation
YY.pred.u <- YYm.pred.u$mean + ehat.oos.u$mean

## calculate RMSE
rmse.u <- sqrt(mean((YY.pred.u - YYtrue)^2))
print(rmse.u)
## compare to same calculation with u.hat above

## clean up
deleteGP(cmle.u$gp)

## End(Not run)
## the example here combines aGP.seq and discrep.est functions; 
## it is comprised of snippets from demo("calib"), which contains
## code from the Calibration Section of vignette("laGP")

## Here we generate calibration data using a true calibration
## parameter, u, and then evaluate log posterior probabilities
## and out-of-sample RMSEs for that u value;  the fcalib 
## documentation repeats this with a single call to fcalib rather 
## than first aGP.seq and then discrep.est

## begin data-generation code identical to aGP.seq, discrep.est, fcalib
## example sections and demo("calib")

## M: computer model test functon used in Goh et al, 2013 (Technometrics)
## an elaboration of one from Bastos and O'Hagan, 2009 (Technometrics) 
M <- function(x,u) 
  {
    x <- as.matrix(x)
    u <- as.matrix(u)
    out <- (1-exp(-1/(2*x[,2]))) 
    out <- out * (1000*u[,1]*x[,1]^3+1900*x[,1]^2+2092*x[,1]+60) 
    out <- out / (100*u[,2]*x[,1]^3+500*x[,1]^2+4*x[,1]+20)  
    return(out)
  }
  
## bias: discrepancy function from Goh et al, 2013 
bias <- function(x) 
  {
    x<-as.matrix(x)   
    out<- 2*(10*x[,1]^2+4*x[,2]^2) / (50*x[,1]*x[,2]+10)
    return(out)
  }

## beta.prior: marginal beta prior for u, 
## defaults to a mode at 1/2 in hypercube
beta.prior <- function(u, a=2, b=2, log=TRUE)
{
  if(length(a) == 1) a <- rep(a, length(u))
  else if(length(a) != length(u)) stop("length(a) must be 1 or length(u)")
  if(length(b) == 1) b <- rep(b, length(u))
  else if(length(b) != length(u)) stop("length(b) must be 1 or length(u)")
  if(log) return(sum(dbeta(u, a, b, log=TRUE)))
  else return(prod(dbeta(u, a, b, log=FALSE)))
}

## tgp for LHS sampling
library(tgp)
rect <- matrix(rep(0:1, 4), ncol=2, byrow=2)

## training and testing inputs
ny <- 50; nny <- 1000  
X <- lhs(ny, rect[1:2,])    ## computer model train
XX <- lhs(nny, rect[1:2,],) ## test

## true (but unknown) setting of the calibration parameter
## for the computer model
u <- c(0.2, 0.1)
Zu <- M(X, matrix(u, nrow=1)) 
ZZu <- M(XX, matrix(u, nrow=1)) 

## field data response, biased and replicated
sd <- 0.5
## Y <- computer output + bias + noise
reps <- 2 ## example from paper uses reps <- 10
Y <- rep(Zu,reps) + rep(bias(X),reps) + rnorm(reps*length(Zu), sd=sd) 
YYtrue <- ZZu + bias(XX) 
## variations: remove the bias or change 2 to 1 to remove replicates

## computer model design
nz <- 10000
XT <- lhs(nz, rect)
nth <- 1 ## number of threads to use in emulation, demo uses 8

## augment with physical model design points 
## with various u settings
XT2 <- matrix(NA, nrow=10*ny, ncol=4)
for(i in 1:10) {
  I <- ((i-1)*ny+1):(ny*i)
  XT2[I,1:2] <- X
}
XT2[,3:4] <- lhs(10*ny, rect[3:4,])
XT <- rbind(XT, XT2)

## evaluate the computer model
Z <- M(XT[,1:2], XT[,3:4])

## flag indicating if estimating bias/discrepancy or not
bias.est <- TRUE
## two passes of ALC with MLE calculations for aGP.seq
methods <- rep("alcray", 2) ## demo uses rep("alc", 2)

## set up priors
da <- d <- darg(NULL, XT)
g <- garg(list(mle=TRUE), Y) 

## end identical data generation code

## now calculate log posterior and do out-of-sample RMSE calculation
## for true calibration parameter value (u).  You could repeat
## this for an estimate value from demo("calib"), for example
## u.hat <- c(0.8236673, 0.1406989)

## first log posterior

## emulate at field-data locations Xu
Xu <- cbind(X, matrix(rep(u, ny), ncol=2, byrow=TRUE))
ehat.u <- aGP.seq(XT, Z, Xu, da, methods, ncalib=2, omp.threads=nth, verb=0)

## estimate discrepancy from the residual
cmle.u <- discrep.est(X, Y, ehat.u$mean, d, g, bias.est, FALSE)
cmle.u$ll <- cmle.u$ll + beta.prior(u)
print(cmle.u$ll)
## compare to same calculation with u.hat above

## now RMSE
## Not run: 
## predictive design with true calibration parameter
XXu <- cbind(XX, matrix(rep(u, nny), ncol=2, byrow=TRUE))

## emulate at predictive design
ehat.oos.u <- aGP.seq(XT, Z, XXu, da, methods, ncalib=2, 
  omp.threads=nth, verb=0)

## predict via discrepency
YYm.pred.u <- predGP(cmle.u$gp, XX)

## add in emulation
YY.pred.u <- YYm.pred.u$mean + ehat.oos.u$mean

## calculate RMSE
rmse.u <- sqrt(mean((YY.pred.u - YYtrue)^2))
print(rmse.u)
## compare to same calculation with u.hat above

## clean up
deleteGP(cmle.u$gp)

## End(Not run)

Calculate the squared Euclidean distance between pairs of points

Description

Calculate the squared Euclidean distance between pairs of points and return a distance matrix

Usage

distance(X1, X2 = NULL)
distance(X1, X2 = NULL)

Arguments

`X1`	a `matrix` or `data.frame` containing real-valued numbers
`X2`	an optional `matrix` or `data.frame` containing real-valued numbers; must have `ncol(X2) = ncol(X1)`

Details

If X2 = NULL distances between X1 and itself are calculated, resulting in an nrow(X1)-by-nrow(X1) distance matrix. Otherwise the result is nrow(X1)-by-nrow(X2) and contains distances between X1 and X2.

Calling distance(X) is the same as distance(X,X)

Value

The output is a matrix, whose dimensions are described in the Details section above

Author(s)

Robert B. Gramacy [email protected]

Examples

x <- seq(-2, 2, length=11)
X <- as.matrix(expand.grid(x, x))
## predictive grid with NN=400
xx <- seq(-1.9, 1.9, length=20)
XX <- as.matrix(expand.grid(xx, xx))

D <- distance(X)
DD <- distance(X, XX)
x <- seq(-2, 2, length=11)
X <- as.matrix(expand.grid(x, x))
## predictive grid with NN=400
xx <- seq(-1.9, 1.9, length=20)
XX <- as.matrix(expand.grid(xx, xx))

D <- distance(X)
DD <- distance(X, XX)

Objective function for performing large scale computer model calibration via optimization

Description

Defines an objective function for performing blackbox optimization towards solving a modularized calibration of large computer model simulation to field data

Usage

fcalib(u, XU, Z, X, Y, da, d, g, uprior = NULL, methods = rep("alc", 2), 
  M = NULL, bias = TRUE, omp.threads = 1, save.global = FALSE, verb = 1)
fcalib(u, XU, Z, X, Y, da, d, g, uprior = NULL, methods = rep("alc", 2), 
  M = NULL, bias = TRUE, omp.threads = 1, save.global = FALSE, verb = 1)

Arguments

`u`	a vector of length `ncol(XU) - ncol(X)` containing a setting of the calibration parameter
`XU`	a `matrix` or `data.frame` containing the full (large) design matrix of input locations to a computer simulator whose final `ncol(XU) - ncol(X)` columns contain settings of a calibration or tuning parameter like `u`
`Z`	a vector of responses/dependent values with `length(Z) = ncol(XU)` of computer model outputs at `XU`
`X`	a `matrix` or `data.frame` containing the full (large) design matrix of input locations
`Y`	a vector of values with `length(Y) = ncol(X)` containing the response from field data observations at `X`. A `Y`-vector with `length(Y) = k*ncol(X)`, for positive integer `k`, can be supplied in which case the multiple `Y`-values will be treated as replicates at the `X`-values
`da`	for emulating `Z` at `XU`: a prior or initial setting for the (single/isotropic) lengthscale parameter in a Gaussian correlation function; a (default) `NULL` value triggers a sensible regularization (prior) and initial setting to be generated via `darg`; a scalar specifies an initial value, causing `darg` to only generate the prior; otherwise, a list or partial list matching the output of `darg` can be used to specify a custom prior. In the case of a partial list, the only the missing entries will be generated. Note that a default/generated list specifies MLE/MAP inference for this parameter. When specifying initial values, a vector of length `nrow(XX)` can be provided, giving a different initial value for each predictive location
`d`	for the discrepancy between emulations `Yhat` at `X`, based on `Z` at `XU`, and the oputs `Y` observed at `X`. Otherwise, same description as `da` above
`g`	for the nugget in the GP model for the discrepancy between emulation `Yhat` at `X`, based on `Z` at `XU`, and the outputs `Y` observed at `X`: a prior or initial setting for the nugget parameter; a `NULL` value causes a sensible regularization (prior) and initial setting to be generated via `garg`; a scalar (default `g = 1/1000`) specifies an initial value, causing `garg` to only generate the prior; otherwise, a list or partial list matching the output of `garg` can be used to specify a custom prior. In the case of a partial list, only the missing entries will be generated. Note that a default/generated list specifies no inference for this parameter; i.e., it is fixed at its starting value, which may be appropriate for emulating deterministic computer code output. At this time, estimating a nugget for the computer model emulator is not supported by `fcalib`
`uprior`	an optional function taking `u` arguments which returns a log prior density value for the calibration parameter.
`methods`	a sequence of local search methods to be deployed when emulating `Z` at `XU` via `aGP`; see `aGP.seq` for more details; provide `methods = FALSE` to use the computer model `M` directly
`M`	a computer model “simulation” function taking two matrices as inputs, to be used in lieu of emulation; see `aGP.seq` for mode details
`bias`	a scalar logical indicating whether a GP discrepancy or bias term should be estimated via `discrep.est`, as opposed to only a Gaussian (zero-mean) variance; see `discrep.est` for more details
`omp.threads`	a scalar positive integer indicating the number of threads to use for SMP parallel processing; see `aGP` for more details
`save.global`	an environment, e.g., `.GlobalEnv` if each evaluation of `fcalib`, say as called by a wrapper or optimization routine, should be saved. The variable used in that environment will be `fcalib.save`. Otherwise `save.global = FALSE` will skip saving the information
`verb`	a non-negative integer specifying the verbosity level; `verb = 0` is quiet, whereas a larger value causes each evaluation to be printed to the screen

Details

Gramacy, et al. (2015) defined an objective function which, when optimized, returns a setting of calibration parameters under a setup akin to the modularized calibration method of Liu, et al., (2009). The fcalib function returns a log density (likelihood or posterior probability) value obtained by performing emulation at a set of inputs X augmented with a value of the calibration parameter, u. The emulator is trained on XU and Z, presumed to be very large relative to the size of the field data set X and Y, necessitating the use of approximate methods like aGP, via aGP.seq. The emulated values, call them Yhat are fed along with X and Y into the discrep.est function, whose likelihood or posterior calculation serves as a measure of merit for the value u.

The fcalib function is deterministic but, as Gramacy, et al. (2015) described, can result is a rugged objective surface for optimizing, meaning that conventional methods, like those in optim are unlikely to work well. They instead recommend using a blackbox derivative-free method, like NOMAD (Le Digabel, 2011). In our example below we use the implementation in the crs package, which provides an R wrapper around the underlying C library.

Note that while fcalib automates a call first to aGP.seq and then to discrep.est, it does not return enough information to complete, say, an out-of-sample prediction exercise like the one demonstrated in the discrep.est documentation. Therefore, after fcalib is used in an optimization to find the best setting of the calibration parameter, u, those functions must then be used in post-processing to complete a prediction exercise. See demo("calib") or vignette("laGP") for more details

Value

Returns a scalar measuring the negative log likelihood or posterior density of the calibration parameter u given the other inputs, for the purpose of optimization over u

Note

Note that in principle a separable correlation function could be used (e.g, via newGPsep and mleGPsep), however this is not implemented at this time

Author(s)

Robert B. Gramacy [email protected]

References

R.B. Gramacy, D. Bingham, JP. Holloway, M.J. Grosskopf, C.C. Kuranz, E. Rutter, M. Trantham, and P.R. Drake (2015). Calibrating a large computer experiment simulating radiative shock hydrodynamics. Annals of Applied Statistics, 9(3) 1141-1168; preprint on arXiv:1410.3293 https://arxiv.org/abs/1410.3293

F. Liu, M. Bayarri, and J. Berger (2009). Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Analysis, 4(1) 119-150.

S. Le Digabel (2011). Algorithm 909: NOMAD: Nonlinear Optimization with the MADS algorithm. ACM Transactions on Mathematical Software, 37, 4, 44:1-44:15.

J.S. Racine, Z. and Nie (2012). crs: Categorical regression splines. R package version 0.15-18.

Examples

## the example here illustrates how fcalib combines aGP.seq and 
## discrep.est functions, duplicating the example in the discrep.est
## documentation file.  It is comprised of snippets from demo("calib"), 
## which contains code from the Calibration Section of vignette("laGP")

## Here we generate calibration data using a true calibration
## parameter, u, and then evaluate log posterior probabilities; 
## the discrep.est documentation repeats this with by first calling
## aGP.seq and then discrep.est.  The answers should be identical, however
## note that a call first to example("fcalib") and then 
## example("discrep.est") will generate two random data sets, causing
## the results not to match

## begin data-generation code identical to aGP.seq, discrep.est, fcalib
## example sections and demo("calib")

## M: computer model test function used in Goh et al, 2013 (Technometrics)
## an elaboration of one from Bastos and O'Hagan, 2009 (Technometrics) 
M <- function(x,u) 
  {
    x <- as.matrix(x)
    u <- as.matrix(u)
    out <- (1-exp(-1/(2*x[,2]))) 
    out <- out * (1000*u[,1]*x[,1]^3+1900*x[,1]^2+2092*x[,1]+60) 
    out <- out / (100*u[,2]*x[,1]^3+500*x[,1]^2+4*x[,1]+20)  
    return(out)
  }
  
## bias: discrepancy function from Goh et al, 2013 
bias <- function(x) 
  {
    x<-as.matrix(x)   
    out<- 2*(10*x[,1]^2+4*x[,2]^2) / (50*x[,1]*x[,2]+10)
    return(out)
  }

## beta.prior: marginal beta prior for u, 
## defaults to a mode at 1/2 in hypercube
beta.prior <- function(u, a=2, b=2, log=TRUE)
{
  if(length(a) == 1) a <- rep(a, length(u))
  else if(length(a) != length(u)) stop("length(a) must be 1 or length(u)")
  if(length(b) == 1) b <- rep(b, length(u))
  else if(length(b) != length(u)) stop("length(b) must be 1 or length(u)")
  if(log) return(sum(dbeta(u, a, b, log=TRUE)))
  else return(prod(dbeta(u, a, b, log=FALSE)))
}

## tgp for LHS sampling
library(tgp)
rect <- matrix(rep(0:1, 4), ncol=2, byrow=2)

## training inputs
ny <- 50; 
X <- lhs(ny, rect[1:2,])    ## computer model train

## true (but unknown) setting of the calibration parameter
## for the computer model
u <- c(0.2, 0.1)
Zu <- M(X, matrix(u, nrow=1)) 

## field data response, biased and replicated
sd <- 0.5
## Y <- computer output + bias + noise
reps <- 2 ## example from paper uses reps <- 10
Y <- rep(Zu,reps) + rep(bias(X),reps) + rnorm(reps*length(Zu), sd=sd) 
## variations: remove the bias or change 2 to 1 to remove replicates

## computer model design
nz <- 10000
XU <- lhs(nz, rect)
nth <- 1 ## number of threads to use in emulation, demo uses 8

## augment with physical model design points 
## with various u settings
XU2 <- matrix(NA, nrow=10*ny, ncol=4)
for(i in 1:10) {
  I <- ((i-1)*ny+1):(ny*i)
  XU2[I,1:2] <- X
}
XU2[,3:4] <- lhs(10*ny, rect[3:4,])
XU <- rbind(XU, XU2)

## evaluate the computer model
Z <- M(XU[,1:2], XU[,3:4])

## flag indicating if estimating bias/discrepancy or not
bias.est <- TRUE
## two passes of ALC with MLE calculations for aGP.seq
methods <- rep("alcray", 2) ## demo uses rep("alc", 2)

## set up priors
da <- d <- darg(NULL, XU)
g <- garg(list(mle=TRUE), Y) 

## end identical data generation code

## now calculate log posterior for true calibration parameter 
## value (u).  You could repeat this for an estimate value 
## from demo("calib"), for example u.hat <- c(0.8236673, 0.1406989)

fcalib(u, XU, Z, X, Y, da, d, g, beta.prior, methods, M, bias.est, nth)
## the example here illustrates how fcalib combines aGP.seq and 
## discrep.est functions, duplicating the example in the discrep.est
## documentation file.  It is comprised of snippets from demo("calib"), 
## which contains code from the Calibration Section of vignette("laGP")

## Here we generate calibration data using a true calibration
## parameter, u, and then evaluate log posterior probabilities; 
## the discrep.est documentation repeats this with by first calling
## aGP.seq and then discrep.est.  The answers should be identical, however
## note that a call first to example("fcalib") and then 
## example("discrep.est") will generate two random data sets, causing
## the results not to match

## begin data-generation code identical to aGP.seq, discrep.est, fcalib
## example sections and demo("calib")

## M: computer model test function used in Goh et al, 2013 (Technometrics)
## an elaboration of one from Bastos and O'Hagan, 2009 (Technometrics) 
M <- function(x,u) 
  {
    x <- as.matrix(x)
    u <- as.matrix(u)
    out <- (1-exp(-1/(2*x[,2]))) 
    out <- out * (1000*u[,1]*x[,1]^3+1900*x[,1]^2+2092*x[,1]+60) 
    out <- out / (100*u[,2]*x[,1]^3+500*x[,1]^2+4*x[,1]+20)  
    return(out)
  }
  
## bias: discrepancy function from Goh et al, 2013 
bias <- function(x) 
  {
    x<-as.matrix(x)   
    out<- 2*(10*x[,1]^2+4*x[,2]^2) / (50*x[,1]*x[,2]+10)
    return(out)
  }

## beta.prior: marginal beta prior for u, 
## defaults to a mode at 1/2 in hypercube
beta.prior <- function(u, a=2, b=2, log=TRUE)
{
  if(length(a) == 1) a <- rep(a, length(u))
  else if(length(a) != length(u)) stop("length(a) must be 1 or length(u)")
  if(length(b) == 1) b <- rep(b, length(u))
  else if(length(b) != length(u)) stop("length(b) must be 1 or length(u)")
  if(log) return(sum(dbeta(u, a, b, log=TRUE)))
  else return(prod(dbeta(u, a, b, log=FALSE)))
}

## tgp for LHS sampling
library(tgp)
rect <- matrix(rep(0:1, 4), ncol=2, byrow=2)

## training inputs
ny <- 50; 
X <- lhs(ny, rect[1:2,])    ## computer model train

## true (but unknown) setting of the calibration parameter
## for the computer model
u <- c(0.2, 0.1)
Zu <- M(X, matrix(u, nrow=1)) 

## field data response, biased and replicated
sd <- 0.5
## Y <- computer output + bias + noise
reps <- 2 ## example from paper uses reps <- 10
Y <- rep(Zu,reps) + rep(bias(X),reps) + rnorm(reps*length(Zu), sd=sd) 
## variations: remove the bias or change 2 to 1 to remove replicates

## computer model design
nz <- 10000
XU <- lhs(nz, rect)
nth <- 1 ## number of threads to use in emulation, demo uses 8

## augment with physical model design points 
## with various u settings
XU2 <- matrix(NA, nrow=10*ny, ncol=4)
for(i in 1:10) {
  I <- ((i-1)*ny+1):(ny*i)
  XU2[I,1:2] <- X
}
XU2[,3:4] <- lhs(10*ny, rect[3:4,])
XU <- rbind(XU, XU2)

## evaluate the computer model
Z <- M(XU[,1:2], XU[,3:4])

## flag indicating if estimating bias/discrepancy or not
bias.est <- TRUE
## two passes of ALC with MLE calculations for aGP.seq
methods <- rep("alcray", 2) ## demo uses rep("alc", 2)

## set up priors
da <- d <- darg(NULL, XU)
g <- garg(list(mle=TRUE), Y) 

## end identical data generation code

## now calculate log posterior for true calibration parameter 
## value (u).  You could repeat this for an estimate value 
## from demo("calib"), for example u.hat <- c(0.8236673, 0.1406989)

fcalib(u, XU, Z, X, Y, da, d, g, beta.prior, methods, M, bias.est, nth)

Localized Approximate GP Prediction At a Single Input Location

Description

Build a sub-design of X of size end, and infer parameters, for approximate Gaussian process prediction at reference location(s) Xref. Return the moments of those predictive equations, and indices into the local design

Usage

laGP(Xref, start, end, X, Z, d = NULL, g = 1/10000,
     method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE,
     close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), 
     alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1, 
     rect = NULL, lite = TRUE, verb = 0)
laGP.R(Xref, start, end, X, Z, d = NULL, g = 1/10000,
     method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"), 
     Xi.ret = TRUE, pall = FALSE, 
     close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)),
     parallel = c("none", "omp", "gpu"), 
     numstart = if(method[1] == "alcray") ncol(X) else 1, 
     rect = NULL, lite = TRUE, verb = 0)
laGPsep(Xref, start, end, X, Z, d = NULL, g = 1/10000,
     method = c("alc", "alcopt", "alcray", "nn"), Xi.ret = TRUE, 
     close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), 
     alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1, 
     rect = NULL, lite = TRUE, verb=0)
laGPsep.R(Xref, start, end, X, Z, d = NULL, g = 1/10000,
     method = c("alc", "alcopt", "alcray", "nn"), 
     Xi.ret = TRUE, pall = FALSE, 
     close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)),
     parallel = c("none", "omp", "gpu"), 
     numstart = if(method[1] == "alcray") ncol(X) else 1, 
     rect = NULL, lite = TRUE, verb = 0)
laGP(Xref, start, end, X, Z, d = NULL, g = 1/10000,
     method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE,
     close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), 
     alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1, 
     rect = NULL, lite = TRUE, verb = 0)
laGP.R(Xref, start, end, X, Z, d = NULL, g = 1/10000,
     method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"), 
     Xi.ret = TRUE, pall = FALSE, 
     close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)),
     parallel = c("none", "omp", "gpu"), 
     numstart = if(method[1] == "alcray") ncol(X) else 1, 
     rect = NULL, lite = TRUE, verb = 0)
laGPsep(Xref, start, end, X, Z, d = NULL, g = 1/10000,
     method = c("alc", "alcopt", "alcray", "nn"), Xi.ret = TRUE, 
     close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), 
     alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1, 
     rect = NULL, lite = TRUE, verb=0)
laGPsep.R(Xref, start, end, X, Z, d = NULL, g = 1/10000,
     method = c("alc", "alcopt", "alcray", "nn"), 
     Xi.ret = TRUE, pall = FALSE, 
     close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)),
     parallel = c("none", "omp", "gpu"), 
     numstart = if(method[1] == "alcray") ncol(X) else 1, 
     rect = NULL, lite = TRUE, verb = 0)

Arguments

`Xref`	a vector of length `ncol(X)` containing a single reference location; or a `matrix` with `ncol(Xref) = ncol(X)` containing multiple reference locations (unless `method = "alcray"`) for simultaneous sub-design and prediction
`start`	the number of Nearest Neighbor (NN) locations for initialization; must specify `start >= 6`
`end`	the total size of the local designs; must have `start < end`
`X`	a `matrix` or `data.frame` containing the full (large) design matrix of input locations
`Z`	a vector of responses/dependent values with `length(Z) = nrow(X)`
`d`	a prior or initial setting for the lengthscale parameter for a Gaussian correlation function; a (default) `NULL` value causes a sensible regularization (prior) and initial setting to be generated via `darg`; a scalar specifies an initial value, causing `darg` to only generate the prior; otherwise, a list or partial list matching the output of `darg` can be used to specify a custom prior. In the case of a partial list, the only the missing entries will be generated. Note that a default/generated list specifies MLE/MAP inference for this parameter. With `laGPsep`, the starting values can be an `ncol(X)`-by-`nrow(XX)` `matrix` or `ncol(X)` vector
`g`	a prior or initial setting for the nugget parameter; a `NULL` value causes a sensible regularization (prior) and initial setting to be generated via `garg`; a scalar (default `g = 1/10000`) specifies an initial value, causing `garg` to only generate the prior; otherwise, a list or partial list matching the output of `garg` can be used to specify a custom prior. In the case of a partial list, only the missing entries will be generated. Note that a default/generated list specifies no inference for this parameter; i.e., it is fixed at its starting or default value, which may be appropriate for emulating deterministic computer code output. In such situations a value much smaller than the default may work even better (i.e., yield better out-of-sample predictive performance). The default was chosen conservatively
`method`	Specifies the method by which `end-start` candidates from `X` are chosen in order to predict at `Xref`. In brief, ALC (`"alc"`, default) minimizes predictive variance; ALCRAY (`"alcray")`) executes a thrifty ALC-based search focused on rays emanating from the reference location [must have `nrow(Xref) = 1`]; ALCOPT (`"alcopt"`) optimizes a continuous ALC analog via derivatives to and snaps the solution back to the candidate grid; MSPE (`"mspe"`) augments ALC with extra derivative information to minimize mean-squared prediction error (requires extra computation); NN (`"nn"`) uses nearest neighbor; and EFI (`"fish"`) uses the expected Fisher information - essentially `1/G` from Gramacy & Apley (2015) - which is global heuristic, i.e., not localized to `Xref`.
`Xi.ret`	A scalar logical indicating whether or not a vector of indices into `X`, specifying the chosen sub-design, should be returned on output
`pall`	a scalar logical (for `laGP.R` only) offering the ability to obtain predictions after every update (for progress indication and debugging), rather than after just the last update
`close`	a non-negative integer `end < close <= nrow(X)` specifying the number of NNs (to `Xref`) in `X` to consider when searching for elements of the sub-design; `close = 0` specifies all. For `method="alcray"` and `method="alcopt"`, this argument specifies the scope used to snap solutions obtained via analog continuous searches back to elements of `X`, otherwise there are no restrictions on those searches. Since these approximate searches are cheaper, they can afford a larger “snapping scope” hence the larger default
`alc.gpu`	a scalar `logical` indicating if a GPU should be used to parallelize the evaluation of the ALC criteria (`method = "alc"`). Requires the package be compiled with CUDA flags; see README/INSTALL in the package source for more details; currently only available for `nrow(Xref) == 1` via `laGP`, not `laGPsep` or the `.R` variants, and only supports off-loading ALC calculation to the GPU
`parallel`	a switch indicating if any parallel calculation of the criteria is desired. Currently parallelization at this level is only provided for option `method = "alc"`). For `parallel = "omp"`, the package must be compiled with OMP flags; for `parallel = "gpu"`, the package must be compiled with CUDA flags see README/INSTALL in the package source for more details; currently only available via `laGP.R`
`numstart`	a scalar integer indicating the number of rays for each greedy search when `method="alcray"` or the number of restarts when `method="alcopt"`. More rays or restarts leads to a more thorough, but more computational intensive search. This argument is not involved in other methods
`rect`	an optional `2`-by-`ncol(X)` `matrix` describing a bounding rectangle for `X` that is used by the `"alcray"` method. If not specified, the rectangle is calculated from `range` applied to the columns of `X`
`lite`	Similar to the `predGP` option of the same name, this argument specifies whether (`TRUE`, the default) or not (`FALSE`) to return a full covariance structure is returned, as opposed the diagonal only. A full covariance structure requires more computation and more storage. This option is only relevant when `nrow(Xref) > 1`
`verb`	a non-negative integer specifying the verbosity level; `verb = 0` is quiet, and larger values cause more progress information to be printed to the screen

Details

A sub-design of X of size end is built-up according to the criteria prescribed by the method and then used to predict at Xref. The first start locations are NNs in order to initialize the first GP, via newGP or newGPsep, and thereby initialize the search. Each subsequent addition is found via the chosen criterion (method), and the GP fit is updated via updateGP or updateGPsep

The runtime is cubic in end, although the multiplicative “constant” out front depends on the method chosen, the size of the design X, and close. The "alcray" method has a smaller constant since it does not search over all candidates exhaustively.

After building the sub-design, local MLE/MAP lengthscale (and/or nugget) parameters are estimated, depending on the d and g arguments supplied. This is facilitated by calls to mleGP or jmleGP.

Finally predGP is called on the resulting local GP model, and the parameters of the resulting Student-t distribution(s) are returned. Unless Xi.ret = FALSE, the indices of the local design are also returned.

laGP.R and laGPsep.R are a prototype R-only version for debugging and transparency purposes. They are slower than laGP and laGPsep, which are primarily in C, and may not provide identical output in all cases due to differing library implementations used as subroutines; see note below for an example. laGP.R and other .R functions in the package may be useful for developing new programs that involve similar subroutines. The current version of laGP.R allows OpenMP and/or GPU parallelization of the criteria (method) if the package is compiled with the appropriate flags. See README/INSTALL in the package source for more information. For algorithmic details, see Gramacy, Niemi, & Weiss (2014)

Value

The output is a list with the following components.

`mean`	a vector of predictive means of length `nrow(Xref)`
`s2`	a vector of Student-t scale parameters of length `nrow(Xref)`
`df`	a Student-t degrees of freedom scalar (applies to all `Xref`)
`llik`	a scalar indicating the maximized log likelihood or log posterior probability of the data/parameter(s) under the chosen sub-design; provided up to an additive constant
`time`	a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation
`method`	a copy of the `method` argument
`d`	a full-list version of the `d` argument, possibly completed by `darg`
`g`	a full-list version of the `g` argument, possibly completed by `garg`
`mle`	if `d$mle` and/or `g$mle` are `TRUE`, then `mle` is a `data.frame` containing the values found for these parameters, and the number of required iterations
`Xi`	when `Xi.ret = TRUE`, this field contains a vector of indices of length `end` into `X` indicating the sub-design chosen
`close`	a copy of the input argument

Note

laGPsep provides the same functionality as laGP but deploys a separable covariance function. However criteria (methods) EFI and MSPE are not supported. This is considered “beta” functionality at this time.

Note that using method="NN" gives the same result as specifying start=end, however at some extra computational expense.

Handling multiple reference locations (nrow(Xref) > 1) is “beta” functionality. In this case the initial start locations are chosen by applying NN to the average distances to all Xref locations. Using method="alcopt" causes exhaustive search to be approximated by a continuous analog via closed form derivatives. See alcoptGP for more details. Although the approximation provided has a spirit similar to method="alcray", in that both methods are intended to offer a thrifty alternative, method="alcray" is not applicable when nrow(Xref) > 1.

Differences between the C qsort function and R's order function may cause chosen designs returned from laGP and laGP.R (code and laGPsep and laGPsep.R) to differ when multiple X values are equidistant to Xref

Author(s)

Robert B. Gramacy [email protected] and Furong Sun [email protected]

References

Examples

## examining a particular laGP call from the larger analysis provided
## in the aGP documentation

## A simple 2-d test function used in Gramacy & Apley (2014);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)) {
      if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- as.matrix(expand.grid(x, x))
Z <- f2d(X)

## optional first pass of nearest neighbor
Xref <- matrix(c(-1.725, 1.725), nrow=TRUE)
out <- laGP(Xref, 6, 50, X, Z, method="nn")

## second pass via ALC, ALCOPT, MSPE, and ALC-ray respectively,
## conditioned on MLE d-values found above
out2 <- laGP(Xref, 6, 50, X, Z, d=out$mle$d)
# out2.alcopt <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcopt")
out2.mspe <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="mspe")
out2.alcray <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcray")

## look at the different designs
plot(rbind(X[out2$Xi,], X[out2.mspe$Xi,]), type="n",
     xlab="x1", ylab="x2", main="comparing local designs")
points(Xref[1], Xref[2], col=2, cex=0.5)
text(X[out2$Xi,], labels=1:50, cex=0.6)
# text(X[out2.alcopt$Xi,], labels=1:50, cex=0.6, col="forestgreen")
text(X[out2.mspe$Xi,], labels=1:50, cex=0.6, col="blue")
text(X[out2.alcray$Xi,], labels=1:50, cex=0.6, col="red")
legend("right", c("ALC", "ALCOPT", "MSPE", "ALCRAY"),
       text.col=c("black", "forestgreen", "blue", "red"), bty="n")

## compare computational time
c(nn=out$time, alc=out2$time, # alcopt=out2.alcopt$time, 
  mspe=out2.mspe$time, alcray=out2.alcray$time)

## Not run: 
  ## Joint path sampling: a comparison between ALC-ex, ALC-opt and NN

  ## defining a predictive path
  wx <- seq(-0.85, 0.45, length=100)
  W <- cbind(wx-0.75, wx^3+0.51)

  ## three comparators from Sun, et al. (2017)
  ## larger-than-default "close" argument to capture locations nearby path
  p.alc <- laGPsep(W, 6, 100, X, Z, close=10000, lite=FALSE)
  p.alcopt <- laGPsep(W, 6, 100, X, Z, method="alcopt", lite=FALSE)
  ## note that close=10*(1000+end) would be the default for method = "alcopt"
  p.nn <- laGPsep(W, 6, 100, X, Z, method="nn", close=10000, lite=FALSE)

  ## time comparison
  c(alc=p.alc$time, alcopt=p.alcopt$time, nn=p.nn$time)

  ## visualization
  plot(W, type="l", xlab="x1", ylab="x2", xlim=c(-2.25,0), ylim=c(-0.75,1.25), lwd=2)
  points(X[p.alc$Xi,], col=2, cex=0.6)
  lines(W[,1]+0.25, W[,2]-0.25, lwd=2)
  points(X[p.nn$Xi,1]+0.25, X[p.nn$Xi,2]-0.25, pch=22, col=3, cex=0.6)
  lines(W[,1]-0.25, W[,2]+0.25, lwd=2) 
  points(X[p.alcopt$Xi,1]-0.25, X[p.alcopt$Xi,2]+0.25, pch=23, col=4, cex=0.6)
  legend("bottomright", c("ALC-opt", "ALC-ex", "NN"), pch=c(22, 21, 23), col=c(4,2,3))

## End(Not run)
## examining a particular laGP call from the larger analysis provided
## in the aGP documentation

## A simple 2-d test function used in Gramacy & Apley (2014);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)) {
      if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- as.matrix(expand.grid(x, x))
Z <- f2d(X)

## optional first pass of nearest neighbor
Xref <- matrix(c(-1.725, 1.725), nrow=TRUE)
out <- laGP(Xref, 6, 50, X, Z, method="nn")

## second pass via ALC, ALCOPT, MSPE, and ALC-ray respectively,
## conditioned on MLE d-values found above
out2 <- laGP(Xref, 6, 50, X, Z, d=out$mle$d)
# out2.alcopt <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcopt")
out2.mspe <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="mspe")
out2.alcray <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcray")

## look at the different designs
plot(rbind(X[out2$Xi,], X[out2.mspe$Xi,]), type="n",
     xlab="x1", ylab="x2", main="comparing local designs")
points(Xref[1], Xref[2], col=2, cex=0.5)
text(X[out2$Xi,], labels=1:50, cex=0.6)
# text(X[out2.alcopt$Xi,], labels=1:50, cex=0.6, col="forestgreen")
text(X[out2.mspe$Xi,], labels=1:50, cex=0.6, col="blue")
text(X[out2.alcray$Xi,], labels=1:50, cex=0.6, col="red")
legend("right", c("ALC", "ALCOPT", "MSPE", "ALCRAY"),
       text.col=c("black", "forestgreen", "blue", "red"), bty="n")

## compare computational time
c(nn=out$time, alc=out2$time, # alcopt=out2.alcopt$time, 
  mspe=out2.mspe$time, alcray=out2.alcray$time)

## Not run: 
  ## Joint path sampling: a comparison between ALC-ex, ALC-opt and NN

  ## defining a predictive path
  wx <- seq(-0.85, 0.45, length=100)
  W <- cbind(wx-0.75, wx^3+0.51)

  ## three comparators from Sun, et al. (2017)
  ## larger-than-default "close" argument to capture locations nearby path
  p.alc <- laGPsep(W, 6, 100, X, Z, close=10000, lite=FALSE)
  p.alcopt <- laGPsep(W, 6, 100, X, Z, method="alcopt", lite=FALSE)
  ## note that close=10*(1000+end) would be the default for method = "alcopt"
  p.nn <- laGPsep(W, 6, 100, X, Z, method="nn", close=10000, lite=FALSE)

  ## time comparison
  c(alc=p.alc$time, alcopt=p.alcopt$time, nn=p.nn$time)

  ## visualization
  plot(W, type="l", xlab="x1", ylab="x2", xlim=c(-2.25,0), ylim=c(-0.75,1.25), lwd=2)
  points(X[p.alc$Xi,], col=2, cex=0.6)
  lines(W[,1]+0.25, W[,2]-0.25, lwd=2)
  points(X[p.nn$Xi,1]+0.25, X[p.nn$Xi,2]-0.25, pch=22, col=3, cex=0.6)
  lines(W[,1]-0.25, W[,2]+0.25, lwd=2) 
  points(X[p.alcopt$Xi,1]-0.25, X[p.alcopt$Xi,2]+0.25, pch=23, col=4, cex=0.6)
  legend("bottomright", c("ALC-opt", "ALC-ex", "NN"), pch=c(22, 21, 23), col=c(4,2,3))

## End(Not run)

Calculate a GP log likelihood

Description

Calculate a Gaussian process (GP) log likelihood or posterior probability with reference to a C-side GP object

Usage

llikGP(gpi, dab = c(0, 0), gab = c(0, 0))
llikGPsep(gpsepi, dab = c(0, 0), gab = c(0, 0))
llikGP(gpi, dab = c(0, 0), gab = c(0, 0))
llikGPsep(gpsepi, dab = c(0, 0), gab = c(0, 0))

Arguments

`gpi`	a C-side GP object identifier (positive integer); e.g., as returned by `newGP`
`gpsepi`	similar to `gpi` but indicating a separable GP object
`dab`	`ab` for the lengthscale parameter, see Details
`gab`	`ab` for the nugget parameter, see Details

Details

An “ab” parameter is a non-negative 2-vector describing shape and rate parameters to a Gamma prior; a zero-setting for either value results in no-prior being used in which case a log likelihood is returned. If both ab parameters are specified, then the value returned can be interpreted as a log posterior density. See darg for more information about ab

Value

A real-valued scalar is returned.

Author(s)

Robert B. Gramacy [email protected]

Examples

## partly following the example in mleGP
if(require("MASS")) {

  ## motorcycle data and predictive locations
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]

  ## get sensible ranges
  d <- darg(NULL, X)
  g <- garg(list(mle=TRUE), Z)
  
  ## initialize the model
  gpi <- newGP(X, Z, d=d$start, g=g$start)

  ## calculate log likelihood
  llikGP(gpi)
  ## calculate posterior probability
  llikGP(gpi, d$ab, g$ab)

  ## clean up
  deleteGP(gpi)
}
## partly following the example in mleGP
if(require("MASS")) {

  ## motorcycle data and predictive locations
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]

  ## get sensible ranges
  d <- darg(NULL, X)
  g <- garg(list(mle=TRUE), Z)
  
  ## initialize the model
  gpi <- newGP(X, Z, d=d$start, g=g$start)

  ## calculate log likelihood
  llikGP(gpi)
  ## calculate posterior probability
  llikGP(gpi, d$ab, g$ab)

  ## clean up
  deleteGP(gpi)
}

Inference for GP correlation parameters

Description

Maximum likelihood/a posteriori inference for (isotropic and separable) Gaussian lengthscale and nugget parameters, marginally or jointly, for Gaussian process regression

Usage

mleGP(gpi, param = c("d", "g"), tmin=sqrt(.Machine$double.eps), 
  tmax = -1, ab = c(0, 0), verb = 0)
mleGPsep(gpsepi, param=c("d", "g", "both"), tmin=rep(sqrt(.Machine$double.eps), 2),
  tmax=c(-1,1), ab=rep(0,4), maxit=100, verb=0)
mleGPsep.R(gpsepi, param=c("d", "g"), tmin=sqrt(.Machine$double.eps), 
  tmax=-1, ab=c(0,0), maxit=100, verb=0)
jmleGP(gpi, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0)
jmleGP.R(gpi, N=100, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0)
jmleGPsep(gpsepi, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), 
  maxit=100, verb=0)
jmleGPsep.R(gpsepi, N=100, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), 
  maxit=100, mleGPsep=mleGPsep.R, verb=0)
mleGP(gpi, param = c("d", "g"), tmin=sqrt(.Machine$double.eps), 
  tmax = -1, ab = c(0, 0), verb = 0)
mleGPsep(gpsepi, param=c("d", "g", "both"), tmin=rep(sqrt(.Machine$double.eps), 2),
  tmax=c(-1,1), ab=rep(0,4), maxit=100, verb=0)
mleGPsep.R(gpsepi, param=c("d", "g"), tmin=sqrt(.Machine$double.eps), 
  tmax=-1, ab=c(0,0), maxit=100, verb=0)
jmleGP(gpi, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0)
jmleGP.R(gpi, N=100, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0)
jmleGPsep(gpsepi, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), 
  maxit=100, verb=0)
jmleGPsep.R(gpsepi, N=100, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), 
  maxit=100, mleGPsep=mleGPsep.R, verb=0)

Arguments

`gpi`	a C-side GP object identifier (positive integer); e.g., as returned by `newGP`
`gpsepi`	similar to `gpi` but indicating a separable GP object, as returned by `newGPsep`
`N`	for `jmleGP.R`, the maximum number of times the pair of margins should be iterated over before determining failed convergence; note that (at this time) `jmleGP` uses a hard-coded `N=100` in its C implementation
`param`	for `mleGP`, indicating whether to work on the lengthscale (`d`) or nugget (`g`) margin
`tmin`	for `mleGP`, smallest value considered for the parameter (`param`)
`tmax`	for `mleGP`, largest value considered for the parameter (`param`); a setting of `-1` for lengthscales, the default, causes `ncol(X)^2` to be used
`drange`	for `jmleGP`, these are `c(tmin, tmax)` values for the lengthscale parameter; the default values are reasonable for 1-d inputs in the unit interval
`grange`	for `jmleGP`, these are `c(tmin, tmax)` values for the nugget parameter; the default values are reasonable for responses with a range of one
`ab`	for `mleGP`, a non-negative 2-vector describing shape and rate parameters to a Gamma prior for the parameter (`param`); a zero-setting for either value results in no-prior being used (MLE inference); otherwise MAP inference is performed
`maxit`	for `mleGPsep` this is passed as `control=list(trace=maxit)` to `optim`'s L-BFGS-B method for optimizing the likelihood/posterior of a separable GP representation; this argument is not used for isotropic GP versions, nor for optimizing the nugget
`dab`	for `jmleGP`, this is `ab` for the lengthscale parameter
`gab`	for `jmleGP`, this is `ab` for the nugget parameter
`mleGPsep`	function for internal MLE calculation of the separable lengthscale parameter; one of either `mleGPsep.R` based on `method="L-BFGS-B"` using `optim`; or `mleGPsep` using the `C` entry point `lbfgsb`. Both options use a `C` backend for the nugget
`verb`	a verbosity argument indicating how much information about the optimization steps should be printed to the screen; `verb <= 0` is quiet; for `jmleGP`, a `verb - 1` value is passed to the `mleGP` or `mleGPsep` subroutine(s)

Details

mleGP and mleGPsep perform marginal (or profile) inference for the specified param, either the lengthscale or the nugget. mleGPsep can perform simultaneous lengthscale and nugget inference via a common gradient with param = "both". More details are provided below.

For the lengthscale, mleGP uses a Newton-like scheme with analytic first and second derivatives (more below) to find the scalar parameter for the isotropic Gaussian correlation function, with hard-coded 100-max iterations threshold and a sqrt(.Machine$double.eps) tolerance for determining convergence; mleGPsep.R uses L-BFGS-B via optim for the vectorized parameter of the separable Gaussian correlation, with a user-supplied maximum number of iterations (maxit) passed to optim. When maxit is reached the output conv = 1 is returned, subsequent identical calls to mleGPsep.R can be used to continue with further iterations. mleGPsep is similar, but uses the C entry point lbfgsb.

For the nugget, both mleGP and mleGPsep utilize a (nearly identical) Newton-like scheme leveraging first and second derivatives.

jmleGP and jmleGPsep provide joint inference by iterating over the marginals of lengthscale and nugget. The jmleGP.R function is an R-only wrapper around mleGP (which is primarily in C), whereas jmleGP is primarily in C but with reduced output and with hard-coded N=100. The same is true for jmleGPsep.

mleGPsep provides a param = "both" alternative to jmleGPsep leveraging a common gradient. It can be helpful to supply a larger maxit argument compared to jmleGPsep as the latter may do up to 100 outer iterations, cycling over lengthscale and nugget. mleGPsep usually requires many fewer total iterations, unless one of the lengthscale or nugget is already converged. In anticipation of param = "both" the mleGPsep function has longer default values for its bounds and prior arguments. These longer arguments are ignored when param != "both". At this time mleGP does not have a param = "both" option.

All methods are initialized at the value of the parameter(s) currently stored by the C-side object referenced by gpi or gpsepi. It is highly recommended that sensible range values (tmin, tmax or drange, grange) be provided. The defaults provided are too loose for most applications. As illustrated in the examples below, the darg and garg functions can be used to set appropriate ranges from the distributions of inputs and output data respectively.

The Newton-like method implemented for the isotropic lengthscale and for the nugget offers very fast convergence to local maxima, but sometimes it fails to converge (for any of the usual reasons). The implementation detects this, and in such cases it invokes a Brent_fmin call instead - this is the method behind the optimize function.

Note that the gpi or gpsepi object(s) must have been allocated with dK=TRUE; alternatively, one can call buildKGP or buildKGPsep - however, this is not in the NAMESPACE at this time

Value

A self-explanatory data.frame is returned containing the values inferred and the number of iterations used. The jmleGP.R and jmleGPsep.R functions will also show progress details (the values obtained after each iteration over the marginals).

However, the most important “output” is the modified GP object which retains the setting of the parameters reported on output as a side effect.

mleGPsep and jmleGPsep provide an output field/column called conv indicating convergence (when 0), or alternately a value agreeing with a non-convergence code provided on output by optim

Author(s)

Robert B. Gramacy [email protected]

References

For standard GP inference, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning, or

Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 5.) https://bobby.gramacy.com/surrogates/

Examples

## a simple example with estimated nugget
if(require("MASS")) {

  ## motorcycle data and predictive locations
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]

  ## get sensible ranges
  d <- darg(NULL, X)
  g <- garg(list(mle=TRUE), Z)

  ## initialize the model
  gpi <- newGP(X, Z, d=d$start, g=g$start, dK=TRUE)

  ## separate marginal inference (not necessary - for illustration only)
  print(mleGP(gpi, "d", d$min, d$max))
  print(mleGP(gpi, "g", g$min, g$max))

  ## joint inference (could skip straight to here)
  print(jmleGP(gpi, drange=c(d$min, d$max), grange=c(g$min, g$max)))

  ## plot the resulting predictive surface
  N <- 100
  XX <- matrix(seq(min(X), max(X), length=N), ncol=1)
  p <- predGP(gpi, XX, lite=TRUE)
  plot(X, Z, main="stationary GP fit to motorcycle data")
  lines(XX, p$mean, lwd=2)
  lines(XX, p$mean+1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2)
  lines(XX, p$mean-1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2)

  ## clean up
  deleteGP(gpi)
}

## 
## with a separable correlation function 
##

## 2D Example: GoldPrice Function, mimicking GP_fit package
f <- function(x) 
{
  x1 <- 4*x[,1] - 2
  x2 <- 4*x[,2] - 2;
  t1 <- 1 + (x1 + x2 + 1)^2*(19 - 14*x1 + 3*x1^2 - 14*x2 + 6*x1*x2 + 3*x2^2);
  t2 <- 30 + (2*x1 -3*x2)^2*(18 - 32*x1 + 12*x1^2 + 48*x2 - 36*x1*x2 + 27*x2^2);
  y <- t1*t2;
  return(y)
}

## build design
library(tgp)
n <- 50 ## change to 100 or 1000 for more interesting demo
B <- rbind(c(0,1), c(0,1))
X <- dopt.gp(n, Xcand=lhs(10*n, B))$XX
## this differs from GP_fit in that we use the log response
Y <- log(f(X))

## get sensible ranges
d <- darg(NULL, X)
g <- garg(list(mle=TRUE), Y)

## build GP and jointly optimize via profile mehtods
gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE)
jmleGPsep(gpisep, drange=c(d$min, d$max), grange=c(g$min, g$max))

## clean up
deleteGPsep(gpisep)

## alternatively, we can optimize via a combined gradient
gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE)
mleGPsep(gpisep, param="both", tmin=c(d$min, g$min), tmax=c(d$max, g$max))
deleteGPsep(gpisep)
## a simple example with estimated nugget
if(require("MASS")) {

  ## motorcycle data and predictive locations
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]

  ## get sensible ranges
  d <- darg(NULL, X)
  g <- garg(list(mle=TRUE), Z)

  ## initialize the model
  gpi <- newGP(X, Z, d=d$start, g=g$start, dK=TRUE)

  ## separate marginal inference (not necessary - for illustration only)
  print(mleGP(gpi, "d", d$min, d$max))
  print(mleGP(gpi, "g", g$min, g$max))

  ## joint inference (could skip straight to here)
  print(jmleGP(gpi, drange=c(d$min, d$max), grange=c(g$min, g$max)))

  ## plot the resulting predictive surface
  N <- 100
  XX <- matrix(seq(min(X), max(X), length=N), ncol=1)
  p <- predGP(gpi, XX, lite=TRUE)
  plot(X, Z, main="stationary GP fit to motorcycle data")
  lines(XX, p$mean, lwd=2)
  lines(XX, p$mean+1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2)
  lines(XX, p$mean-1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2)

  ## clean up
  deleteGP(gpi)
}

## 
## with a separable correlation function 
##

## 2D Example: GoldPrice Function, mimicking GP_fit package
f <- function(x) 
{
  x1 <- 4*x[,1] - 2
  x2 <- 4*x[,2] - 2;
  t1 <- 1 + (x1 + x2 + 1)^2*(19 - 14*x1 + 3*x1^2 - 14*x2 + 6*x1*x2 + 3*x2^2);
  t2 <- 30 + (2*x1 -3*x2)^2*(18 - 32*x1 + 12*x1^2 + 48*x2 - 36*x1*x2 + 27*x2^2);
  y <- t1*t2;
  return(y)
}

## build design
library(tgp)
n <- 50 ## change to 100 or 1000 for more interesting demo
B <- rbind(c(0,1), c(0,1))
X <- dopt.gp(n, Xcand=lhs(10*n, B))$XX
## this differs from GP_fit in that we use the log response
Y <- log(f(X))

## get sensible ranges
d <- darg(NULL, X)
g <- garg(list(mle=TRUE), Y)

## build GP and jointly optimize via profile mehtods
gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE)
jmleGPsep(gpisep, drange=c(d$min, d$max), grange=c(g$min, g$max))

## clean up
deleteGPsep(gpisep)

## alternatively, we can optimize via a combined gradient
gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE)
mleGPsep(gpisep, param="both", tmin=c(d$min, g$min), tmax=c(d$max, g$max))
deleteGPsep(gpisep)

Create A New GP Object

Description

Build a Gaussian process C-side object based on the X-Z data and parameters provided, and augment those objects with new data

Usage

newGP(X, Z, d, g, dK = FALSE)
newGPsep(X, Z, d, g, dK = FALSE)
updateGP(gpi, X, Z, verb = 0)
updateGPsep(gpsepi, X, Z, verb = 0)
newGP(X, Z, d, g, dK = FALSE)
newGPsep(X, Z, d, g, dK = FALSE)
updateGP(gpi, X, Z, verb = 0)
updateGPsep(gpsepi, X, Z, verb = 0)

Arguments

`X`	a `matrix` or `data.frame` containing the full (large) design matrix of input locations
`Z`	a vector of responses/dependent values with `length(Z) = nrow(X)`
`d`	a positive scalar lengthscale parameter for an isotropic Gaussian correlation function (`newGP`); or a vector for a separable version (`newGPsep`)
`g`	a positive scalar nugget parameter
`dK`	a scalar logical indicating whether or not derivative information (for the lengthscale) should be maintained by the GP object; this is required for calculating MLEs/MAPs of the lengthscale parameter(s) via `mleGP` and `jmleGP`
`gpi`	a C-side GP object identifier (positive integer); e.g., as returned by `newGP`
`gpsepi`	similar to `gpi` but indicating a separable GP object, as returned by `newGPsep`
`verb`	a non-negative integer indicating the verbosity level. A positive value causes progress statements to be printed to the screen for each update of `i in 1:nrow(X)`

Details

newGP allocates a new GP object on the C-side and returns its unique integer identifier (gpi), taking time which is cubic on nrow(X); allocated GP objects must (eventually) be destroyed with deleteGP or deleteGPs or memory will leak. The same applies for newGPsep, except deploying a separable correlation with limited feature set; see deleteGPsep and deleteGPseps

updateGP takes gpi identifier as input and augments that GP with new data. A sequence of updates is performed, for each i in 1:nrow(X), each taking time which is quadratic in the number of data points. updateGP also updates any statistics needed in order to quickly search for new local design candidates via laGP. The same applies to updateGPsep on gpsepi objects

Value

newGP and newGPsep create a unique GP indicator (gpi or gpsepi) referencing a C-side object; updateGP and updateGPsep do not return anything, but yields a modified C-side object as a side effect

Author(s)

Robert B. Gramacy [email protected]

References

For standard GP inference, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning, or

Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 6.) https://bobby.gramacy.com/surrogates/

For efficient updates of GPs, see:

Examples

## for more examples, see predGP and mleGP docs

## simple sine data
X <- matrix(seq(0,2*pi,length=7), ncol=1)
Z <- sin(X)

## new GP fit
gpi <- newGP(X, Z, 2, 0.000001)

## make predictions
XX <- matrix(seq(-1,2*pi+1, length=499), ncol=ncol(X))
p <- predGP(gpi, XX)

## sample from the predictive distribution
if(require(mvtnorm)) {
  N <- 100
  ZZ <- rmvt(N, p$Sigma, p$df) 
  ZZ <- ZZ + t(matrix(rep(p$mean, N), ncol=N))
  matplot(XX, t(ZZ), col="gray", lwd=0.5, lty=1, type="l", 
         xlab="x", ylab="f-hat(x)", bty="n")
  points(X, Z, pch=19, cex=2)
}

## update with four more points
X2 <- matrix(c(pi/2, 3*pi/2, -0.5, 2*pi+0.5), ncol=1)
Z2 <- sin(X2)
updateGP(gpi, X2, Z2)

## make a new set of predictions
p2 <- predGP(gpi, XX)
if(require(mvtnorm)) {
  ZZ <- rmvt(N, p2$Sigma, p2$df) 
  ZZ <- ZZ + t(matrix(rep(p2$mean, N), ncol=N))
  matplot(XX, t(ZZ), col="gray", lwd=0.5, lty=1, type="l", 
         xlab="x", ylab="f-hat(x)", bty="n")
  points(X, Z, pch=19, cex=2)
  points(X2, Z2, pch=19, cex=2, col=2)
}

## clean up
deleteGP(gpi)
## for more examples, see predGP and mleGP docs

## simple sine data
X <- matrix(seq(0,2*pi,length=7), ncol=1)
Z <- sin(X)

## new GP fit
gpi <- newGP(X, Z, 2, 0.000001)

## make predictions
XX <- matrix(seq(-1,2*pi+1, length=499), ncol=ncol(X))
p <- predGP(gpi, XX)

## sample from the predictive distribution
if(require(mvtnorm)) {
  N <- 100
  ZZ <- rmvt(N, p$Sigma, p$df) 
  ZZ <- ZZ + t(matrix(rep(p$mean, N), ncol=N))
  matplot(XX, t(ZZ), col="gray", lwd=0.5, lty=1, type="l", 
         xlab="x", ylab="f-hat(x)", bty="n")
  points(X, Z, pch=19, cex=2)
}

## update with four more points
X2 <- matrix(c(pi/2, 3*pi/2, -0.5, 2*pi+0.5), ncol=1)
Z2 <- sin(X2)
updateGP(gpi, X2, Z2)

## make a new set of predictions
p2 <- predGP(gpi, XX)
if(require(mvtnorm)) {
  ZZ <- rmvt(N, p2$Sigma, p2$df) 
  ZZ <- ZZ + t(matrix(rep(p2$mean, N), ncol=N))
  matplot(XX, t(ZZ), col="gray", lwd=0.5, lty=1, type="l", 
         xlab="x", ylab="f-hat(x)", bty="n")
  points(X, Z, pch=19, cex=2)
  points(X2, Z2, pch=19, cex=2, col=2)
}

## clean up
deleteGP(gpi)

Optimize an objective function under multiple blackbox constraints

Description

Uses a surrogate modeled augmented Lagrangian (AL) system to optimize an objective function (blackbox or known and linear) under unknown multiple (blackbox) constraints via expected improvement (EI) and variations; a comparator based on EI with constraints is also provided

Usage

optim.auglag(fn, B, fhat = FALSE, equal = FALSE, ethresh = 1e-2, 
  slack = FALSE, cknown = NULL, start = 10, end = 100, 
  Xstart = NULL, sep = TRUE, ab = c(3/2, 8), lambda = 1, rho = NULL, 
  urate = 10, ncandf = function(t) { t }, dg.start = c(0.1, 1e-06), 
  dlim = sqrt(ncol(B)) * c(1/100, 10), Bscale = 1, ey.tol = 1e-2, 
  N = 1000, plotprog = FALSE, verb = 2, ...)
optim.efi(fn, B, fhat = FALSE, cknown = NULL, start = 10, end = 100, 
  Xstart = NULL, sep = TRUE, ab = c(3/2,8), urate = 10, 
  ncandf = function(t) { t }, dg.start = c(0.1, 1e-6), 
  dlim = sqrt(ncol(B))*c(1/100,10), Bscale = 1, plotprog = FALSE, 
  verb = 2, ...)
optim.auglag(fn, B, fhat = FALSE, equal = FALSE, ethresh = 1e-2, 
  slack = FALSE, cknown = NULL, start = 10, end = 100, 
  Xstart = NULL, sep = TRUE, ab = c(3/2, 8), lambda = 1, rho = NULL, 
  urate = 10, ncandf = function(t) { t }, dg.start = c(0.1, 1e-06), 
  dlim = sqrt(ncol(B)) * c(1/100, 10), Bscale = 1, ey.tol = 1e-2, 
  N = 1000, plotprog = FALSE, verb = 2, ...)
optim.efi(fn, B, fhat = FALSE, cknown = NULL, start = 10, end = 100, 
  Xstart = NULL, sep = TRUE, ab = c(3/2,8), urate = 10, 
  ncandf = function(t) { t }, dg.start = c(0.1, 1e-6), 
  dlim = sqrt(ncol(B))*c(1/100,10), Bscale = 1, plotprog = FALSE, 
  verb = 2, ...)

Arguments

`fn`	function of an input (`x`), facilitating vectorization on a `matrix` `X` thereof, returning a `list` with elements `$obj` containing the (scalar) objective value and `$c` containing a vector of evaluations of the (multiple) constraint function at `x`. The `fn` function must take a `known.only` argument which is explained in the note below; it need not act on that argument
`B`	2-column `matrix` describing the bounding box. The number of rows of the `matrix` determines the input dimension (`length(x)` in `fn(x)`); the first column gives lower bounds and the second gives upper bounds
`fhat`	a scalar logical indicating if the objective function should be modeled with a GP surrogate. The default of `FALSE` assumes a known linear objective scaled by `Bscale`. Using `TRUE` is an “alpha” feature at this time
`equal`	an optional vector containing zeros and ones, whose length equals the number of constraints, specifying which should be treated as equality constraints (`0`) and which as inequality (`1`)
`ethresh`	a threshold used for equality constraints to determine validity for progress measures; ignored if there are no equality constraints
`slack`	A scalar logical indicating if slack variables, and thus exact EI calculations should be used. The default of `slack = FALSE` results in Monte Carlo EI approximation. One can optionally specify `slack = 2` to get the `slack = TRUE` behavior, with a second-stage L-BFGS-B optimization of the EI acquisition applied at the end, starting from the best value found on the random search grid
`cknown`	A optional positive integer vector specifying which of the constraint values returned by `fn` should be treated as “known”, i.e., not modeled with Gaussian processes
`start`	positive integer giving the number of random starting locations before sequential design (for optimization) is performed; `start >= 6` is recommended unless `Xstart` is non-`NULL`; in the current version the starting locations come from a space-filling design via `dopt.gp`
`end`	positive integer giving the total number of evaluations/trials in the optimization; must have `end > start`
`Xstart`	optional matrix of starting design locations in lieu of, or in addition to, `start` random ones; we recommend `nrow(Xstart) + start >= 6`; also must have `ncol(Xstart) = nrow(B)`
`sep`	The default `sep = TRUE` uses separable GPs (i.e., via `newGPsep`, etc.) to model the constraints and objective; otherwise the isotropic GPs are used
`ab`	prior parameters; see `darg` describing the prior used on the lengthscale parameter during emulation(s) for the constraints
`lambda`	`m`-dimensional initial Lagrange multiplier parameter for `m`-constraints
`rho`	positive scalar initial quadratic penalty parameter in the augmented Lagrangian; the default setting of `rho = NULL` causes an automatic starting value to be chosen; see rejoinder to Gramacy, et al. (2016) or supplementary material to Picheny, et al. (2016)
`urate`	positive integer indicating how many optimization trials should pass before each MLE/MAP update is performed for GP correlation lengthscale parameter(s)
`ncandf`	function taking a single integer indicating the optimization trial number `t`, where `start < t <= end`, and returning the number of search candidates (e.g., for expected improvement calculations) at round `t`; the default setting allows the number of candidates to grow linearly with `t`
`dg.start`	2-vector giving starting values for the lengthscale and nugget parameters of the GP surrogate model(s) for constraints
`dlim`	2-vector giving bounds for the lengthscale parameter(s) under MLE/MAP inference, thereby augmenting the prior specification in `ab`
`Bscale`	scalar indicating the relationship between the sum of the inputs, `sum(x)`, to `fn` and the output `fn(x)$obj`; note that at this time only linear objectives are fully supported by the code - more details below
`ey.tol`	a scalar proportion indicating how many of the EIs at `ncandf(t)` candidate locations must be non-zero to “trust” that metric to guide search, reducing to an EY-based search instead [choosing that proportion to be one forces EY-based search]
`N`	positive scalar integer indicating the number of Monte Carlo samples to be used for calculating EI and EY
`plotprog`	`logical` indicating if progress plots should be made after each inner iteration; the plots show three panels tracking the best valid objective, the EI or EY surface over the first two input variables (requires `interp`, and the parameters of the lengthscale(s) of the GP(s) respectively. When `plotprog = TRUE` the `interp.loess` function is used to aid in creating surface plots, however this does not work well with fewer than fifteen points. You may also provide a function as an argument, having similar arguments/formals as `interp.loess`. For example, we use `interp` below, which would have been the default if not for licensing incompatibilities
`verb`	a non-negative integer indicating the verbosity level; the larger the value the more that is printed to the screen
`...`	additional arguments passed to `fn`

Details

These subroutines support a suite of methods used to optimize challenging constrained problems from Gramacy, et al. (2016); and from Picheny, et al., (2016) see references below.

Those schemes hybridize Gaussian process based surrogate modeling and expected improvement (EI; Jones, et., al, 2008) with the additive penalty method (APM) implemented by the augmented Lagrangian (AL, e.g., Nocedal & Wright, 2006). The goal is to minimize a (possibly known) linear objective function f(x) under multiple, unknown (blackbox) constraint functions satisfying c(x) <= 0, which is vector-valued. The solution here emulates the components of c with Gaussian process surrogates, and guides optimization by searching the posterior mean surface, or the EI of, the following composite objective

$Y(x) = f(x) + \lambda^\top Y_c(x) + \frac{1}{2\rho} \sum_{i=1}^m \max(0, Y_{c_i}(x))^2,$

where $\lambda$ and $\rho$ follow updating equations that guarantee convergence to a valid solution minimizing the objective. For more details, see Gramacy, et al. (2016).

A slack variable implementation that allows for exact EI calculations and can accommodate equality constraints, and mixed (equality and inequality) constraints, is also provided. For further details, see Picheny, et al. (2016).

The example below illustrates a variation on the toy problem considered in both papers, which bases sampling on EI. For examples making used of equality constraints, following the Picheny, et al (2016) papers; see the demos listed in the “See Also” section below.

Although it is off by default, these functions allow an unknown objective to be modeled (fhat = TRUE), rather than assuming a known linear one. For examples see demo("ALfhat") and demo("GSBP") which illustrate the AL and comparators in inequality and mixed constraints setups, respectively.

The optim.efi function is provided as a comparator. This method uses the same underlying GP models to with the hybrid EI and probability of satisfying the constraints heuristic from Schonlau, et al., (1998). See demo("GSBP") and demo("LAH") for optim.efi examples and comparisons between the original AL, the slack variable enhancement(s) on mixed constraint problems with known and blackbox objectives, respectively

Value

The output is a list summarizing the progress of the evaluations of the blackbox under optimization

`prog`	vector giving the best valid (`c(x) < 0`) value of the objective over the trials
`obj`	vector giving the value of the objective for the input under consideration at each trial
`X`	`matrix` giving the input values at which the blackbox function was evaluated
`C`	`matrix` giving the value of the constraint function for the input under consideration at each trial
`d`	`matrix` of lengthscale values obtained at the final update of the GP emulator for each constraint
`df`	if `fhat = TRUE` then this is a `matrix` of lengthscale values for the objective obtained at the final update of the GP emulator
`lambda`	a `matrix` containing `lambda` vectors used in each “outer loop” AL iteration
`rho`	a vector of `rho` values used in each “outer loop” AL iteration

Note

This function is under active development, especially the newest features including separable GP surrogate modeling, surrogate modeling of a blackbox objective, and the use of slack variables for exact EI calculations and the support if equality constraints. Also note that, compared with earlier versions, it is now required to augment your blackbox function (fn) with an argument named known.only. This allows the user to specify if a potentially different object (with a subset of the outputs, those that are “known”) gets returned in certain circumstances. For example, the objective is treated as known in many of our examples. When a non-null cknown object is used, the known.only flag can be used to return only the outputs which are known.

Older versions of this function provided an argument called nomax. The NoMax feature is no longer supported

Author(s)

Robert B. Gramacy [email protected]

References

Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 7.) https://bobby.gramacy.com/surrogates/

Picheny, V., Gramacy, R.B., Wild, S.M., Le Digabel, S. (2016). “Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian”. Preprint available on arXiv:1605.09466; https://arxiv.org/abs/1605.09466

Gramacy, R.B, Gray, G.A, Lee, H.K.H, Le Digabel, S., Ranjan P., Wells, G., Wild, S.M. (2016) “Modeling an Augmented Lagrangian for Improved Blackbox Constrained Optimization”, Technometrics (with discussion), 58(1), 1-11. Preprint available on arXiv:1403.4890; https://arxiv.org/abs/1403.4890

Jones, D., Schonlau, M., and Welch, W. J. (1998). “Efficient Global Optimization of Expensive Black Box Functions.” Journal of Global Optimization, 13, 455-492.

Schonlau, M., Jones, D.R., and Welch, W. J. (1998). “Global Versus Local Search in Constrained Optimization of Computer Models.” In New Developments and Applications in Experimental Design, vol. 34, 11-25. Institute of Mathematical Statistics.

Nocedal, J. and Wright, S.J. (2006). Numerical Optimization. 2nd ed. Springer.

Examples

## this example assumes a known linear objective; further examples
## are in the optim.auglag demo

## a test function returning linear objective evaluations and 
## non-linear constraints
aimprob <- function(X, known.only = FALSE)
{
  if(is.null(nrow(X))) X <- matrix(X, nrow=1)
  f <- rowSums(X)
  if(known.only) return(list(obj=f))
  c1 <- 1.5-X[,1]-2*X[,2]-0.5*sin(2*pi*(X[,1]^2-2*X[,2]))
  c2 <- rowSums(X^2)-1.5
  return(list(obj=f, c=cbind(c1,c2)))
}

## set bounding rectangle for adaptive sampling
B <- matrix(c(rep(0,2),rep(1,2)),ncol=2)

## optimization (primarily) by EI, change 25 to 100 for
## 99% chance of finding the global optimum with value 0.6
if(require("interp")) { ## for plotprog=interp
  out <- optim.auglag(aimprob, B, end=25, plotprog=interp)
} else {
  out <- optim.auglag(aimprob, B, end=25)
}

## using the slack variable implementation which is a little slower
## but more precise; slack=2 augments random search with L-BFGS-B
  
out2 <- optim.auglag(aimprob, B, end=25, slack=TRUE)
## Not run: 
out3 <- optim.auglag(aimprob, B, end=25, slack=2)

## End(Not run)

## for more slack examples and comparison to optim.efi on problems
## involving equality and mixed (equality and inequality) constraints,
## see demo("ALfhat"), demo("GSBP") and demo("LAH")

## for comparison, here is a version that uses simulated annealing
## with the Additive Penalty Method (APM) for constraints
## Not run: 
aimprob.apm <- function(x, B=matrix(c(rep(0,2),rep(1,2)),ncol=2))
{ 
  ## check bounding box
  for(i in 1:length(x)) {
    if(x[i] < B[i,1] || x[i] > B[i,2]) return(Inf)
  }

  ## evaluate objective and constraints
  f <- sum(x)
  c1 <- 1.5-x[1]-2*x[2]-0.5*sin(2*pi*(x[1]^2-2*x[2]))
  c2 <- x[1]^2+x[2]^2-1.5

  ## return APM composite
  return(f + abs(c1) + abs(c2))
}

## use SA; specify control=list(maxit=100), say, to control max 
## number of iterations; does not easily facilitate plotting progress
out4 <- optim(runif(2), aimprob.apm, method="SANN") 
## check the final value, which typically does not satisfy both
## constraints
aimprob(out4$par)

## End(Not run)

## for a version with a modeled objective see demo("ALfhat")
## this example assumes a known linear objective; further examples
## are in the optim.auglag demo

## a test function returning linear objective evaluations and 
## non-linear constraints
aimprob <- function(X, known.only = FALSE)
{
  if(is.null(nrow(X))) X <- matrix(X, nrow=1)
  f <- rowSums(X)
  if(known.only) return(list(obj=f))
  c1 <- 1.5-X[,1]-2*X[,2]-0.5*sin(2*pi*(X[,1]^2-2*X[,2]))
  c2 <- rowSums(X^2)-1.5
  return(list(obj=f, c=cbind(c1,c2)))
}

## set bounding rectangle for adaptive sampling
B <- matrix(c(rep(0,2),rep(1,2)),ncol=2)

## optimization (primarily) by EI, change 25 to 100 for
## 99% chance of finding the global optimum with value 0.6
if(require("interp")) { ## for plotprog=interp
  out <- optim.auglag(aimprob, B, end=25, plotprog=interp)
} else {
  out <- optim.auglag(aimprob, B, end=25)
}

## using the slack variable implementation which is a little slower
## but more precise; slack=2 augments random search with L-BFGS-B
  
out2 <- optim.auglag(aimprob, B, end=25, slack=TRUE)
## Not run: 
out3 <- optim.auglag(aimprob, B, end=25, slack=2)

## End(Not run)

## for more slack examples and comparison to optim.efi on problems
## involving equality and mixed (equality and inequality) constraints,
## see demo("ALfhat"), demo("GSBP") and demo("LAH")

## for comparison, here is a version that uses simulated annealing
## with the Additive Penalty Method (APM) for constraints
## Not run: 
aimprob.apm <- function(x, B=matrix(c(rep(0,2),rep(1,2)),ncol=2))
{ 
  ## check bounding box
  for(i in 1:length(x)) {
    if(x[i] < B[i,1] || x[i] > B[i,2]) return(Inf)
  }

  ## evaluate objective and constraints
  f <- sum(x)
  c1 <- 1.5-x[1]-2*x[2]-0.5*sin(2*pi*(x[1]^2-2*x[2]))
  c2 <- x[1]^2+x[2]^2-1.5

  ## return APM composite
  return(f + abs(c1) + abs(c2))
}

## use SA; specify control=list(maxit=100), say, to control max 
## number of iterations; does not easily facilitate plotting progress
out4 <- optim(runif(2), aimprob.apm, method="SANN") 
## check the final value, which typically does not satisfy both
## constraints
aimprob(out4$par)

## End(Not run)

## for a version with a modeled objective see demo("ALfhat")

GP Prediction/Kriging

Description

Perform Gaussian processes prediction (under isotropic or separable formulation) at new XX locations using a GP object stored on the C-side

Usage

predGP(gpi, XX, lite = FALSE, nonug = FALSE)
predGPsep(gpsepi, XX, lite = FALSE, nonug = FALSE)
predGP(gpi, XX, lite = FALSE, nonug = FALSE)
predGPsep(gpsepi, XX, lite = FALSE, nonug = FALSE)

Arguments

`gpi`	a C-side GP object identifier (positive integer); e.g., as returned by `newGP`
`gpsepi`	similar to `gpi` but indicating a separable GP object, as returned by `newGPsep`
`XX`	a `matrix` or `data.frame` containing a design of predictive locations
`lite`	a scalar logical indicating whether (`lite = FALSE`, default) or not (`lite = TRUE`) a full predictive covariance matrix should be returned, as would be required for plotting random sample paths, but substantially increasing computation time if only point-prediction is required
`nonug`	a scalar logical indicating if a (nonzero) nugget should be used in the predictive equations; this allows the user to toggle between visualizations of uncertainty due just to the mean, and a full quantification of predictive uncertainty. The latter (default `nonug = FALSE`) is the standard approach, but the former may work better in some sequential design contexts. See, e.g., `ieciGP`

Details

Returns the parameters of Student-t predictive equations. By default, these include a full predictive covariance matrix between all XX locations. However, this can be slow when nrow(XX) is large, so a lite options is provided, which only returns the diagonal of that matrix.

GP prediction is sometimes called “kriging”, especially in the spatial statistics literature. So this function could also be described as returning evaluations of the “kriging equations”

Value

The output is a list with the following components.

`mean`	a vector of predictive means of length `nrow(Xref)`
`Sigma`	covariance matrix for a multivariate Student-t distribution; alternately if `lite = TRUE`, then a field `s2` contains the diagonal of this matrix
`df`	a Student-t degrees of freedom scalar (applies to all `XX`)

Author(s)

Robert B. Gramacy [email protected]

References

For standard GP prediction, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning, or

Examples

## a "computer experiment" -- a much smaller version than the one shown
## in the aGP docs

## Simple 2-d test function used in Gramacy & Apley (2015);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)) {
      if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## design with N=441
x <- seq(-2, 2, length=11)
X <- expand.grid(x, x)
Z <- f2d(X)

## fit a GP
gpi <- newGP(X, Z, d=0.35, g=1/1000)

## predictive grid with NN=400
xx <- seq(-1.9, 1.9, length=20)
XX <- expand.grid(xx, xx)
ZZ <- f2d(XX)

## predict
p <- predGP(gpi, XX, lite=TRUE)
## RMSE: compare to similar experiment in aGP docs
sqrt(mean((p$mean - ZZ)^2))

## visualize the result
par(mfrow=c(1,2))
image(xx, xx, matrix(p$mean, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="predictive mean")
image(xx, xx, matrix(p$mean-ZZ, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="bas")

## clean up
deleteGP(gpi)

## see the newGP and mleGP docs for examples using lite = FALSE for
## sampling from the joint predictive distribution
## a "computer experiment" -- a much smaller version than the one shown
## in the aGP docs

## Simple 2-d test function used in Gramacy & Apley (2015);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)) {
      if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## design with N=441
x <- seq(-2, 2, length=11)
X <- expand.grid(x, x)
Z <- f2d(X)

## fit a GP
gpi <- newGP(X, Z, d=0.35, g=1/1000)

## predictive grid with NN=400
xx <- seq(-1.9, 1.9, length=20)
XX <- expand.grid(xx, xx)
ZZ <- f2d(XX)

## predict
p <- predGP(gpi, XX, lite=TRUE)
## RMSE: compare to similar experiment in aGP docs
sqrt(mean((p$mean - ZZ)^2))

## visualize the result
par(mfrow=c(1,2))
image(xx, xx, matrix(p$mean, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="predictive mean")
image(xx, xx, matrix(p$mean-ZZ, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="bas")

## clean up
deleteGP(gpi)

## see the newGP and mleGP docs for examples using lite = FALSE for
## sampling from the joint predictive distribution

Generate two-dimensional random paths

Description

Generate two-dimensional random paths (one-dimensional manifolds in 2d) comprising of different randomly chosen line types: linear, quadratic, cubic, exponential, and natural logarithm. If the input dimensionality is higher than 2, then a line in two randomly chosen input coordinates is generated

Usage

  randLine(a, D, N, smin, res)
randLine(a, D, N, smin, res)

Arguments

`a`	a fixed two-element vector denoting the range of the bounding box (lower bound and upper bound) of all input coordinates
`D`	a scalar denoting the dimensionality of input space
`N`	a scalar denoting the desired total number of random lines
`smin`	a scalar denoting the minimum absolute scaling constant, i.e., the length of the shortest line that could be generated
`res`	a scalar denoting the number of data points, i.e., the resolution on the random path

Details

This two-dimensional random line generating function produces different types of 2d random paths, including linear, quadratic, cubic, exponential, and natural logarithm.

First, one of these line types is chosen uniformly at random. The line is then drawn, via a collection of discrete points, from the origin according to the arguments, e.g., resolution and length, provided by the user. The discrete set of coordinates are then shifted and scaled, uniformly at random, into the specified 2d rectangle, e.g., $[-2,2]^2$ , with the restriction that at least half of the points comprising the line lie within the rectangle.

For a quick visualization, see Figure 15 in Sun, et al. (2017). Figure 7 in the same manuscript illustrates the application of this function in out-of-sample prediction using laGPsep, in 2d and 4d, respectively.

randLine returns different types of random paths and the indices of the randomly selected pair, i.e., subset, of input coordinates (when D > 2).

Value

randLine returns a list of lists. The outer list is of length six, representing each of the five possible line types (linear, quadratic, cubic, exponential, and natural logarithm), with the sixth entry providing the randomly chosen input dimensions.

The inner lists are comprised of $res \times 2$ data.frames, the number of which span N samples across all inner lists.

Note

Users should scale each coordinate of global input space to the same coded range, e.g., $[-2,2]^D$ , in order to avoid computational burden caused by passing global input space argument. Users may convert back to the natural units when necessary.

Author(s)

Furong Sun [email protected] and Robert B. Gramacy [email protected]

References

Examples

## 1. visualization of the randomly generated paths

## generate the paths
D <- 4
a <- c(-2, 2)
N <- 30
smin <- 0.1
res <- 100
line.set <- randLine(a=a, D=D, N=N, smin=smin, res=res)
  
## the indices of the randomly selected pair of input coordinates
d <- line.set$d

## visualization

## first create an empty plot
par(mar=c(5, 4, 6, 2) + 0.1)
plot(0, xlim=a, ylim=a, type="l", xlab=paste("factor ", d[1], sep=""), 
     ylab=paste("factor ", d[2], sep=""), main="2d random paths", 
     cex.lab=1.5, cex.main=2)
abline(h=(a[1]+a[2])/2, v=(a[1]+a[2])/2, lty=2)

## merge each path type together
W <- unlist(list(line.set$lin, line.set$qua, line.set$cub, line.set$ep, line.set$ln), 
  recursive=FALSE)

## calculate colors to retain 
n <- unlist(lapply(line.set, length)[-6])
cols <- rep(c("orange", "blue", "forestgreen", "magenta", "cornflowerblue"), n)

## plot randomly generated paths with a centering dot in red at the midway point
for(i in 1:N){
  lines(W[[i]][,1], W[[i]][,2], col=cols[i])
  points(W[[i]][res/2,1], W[[i]][res/2,2], col=2, pch=20)
}

## add legend
legend("top", legend=c("lin", "qua", "cub", "exp", "log"), cex=1.5, bty="n",
       xpd=TRUE, horiz=TRUE, inset=c(0, -0.085), lty=rep(1, 5), lwd=rep(1, 5),
       col=c("orange", "blue", "forestgreen", "magenta", "cornflowerblue"))

## 2. use the random paths for out-of-sample prediction via laGPsep

## test function (same 2d function as in other examples package)
## (ignoring 4d nature of path generation above)
f2d <- function(x, y=NULL){
  if(is.null(y)){
     if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
     y <- x[,2]; x <- x[,1]
  }
  g <- function(z)
  return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
  z <- -g(x)*g(y)
}
    
## generate training data using 2d input space
x <- seq(a[1], a[2], by=0.02)
X <- as.matrix(expand.grid(x, x))
Y <- f2d(X)

## example of joint path calculation folowed by RMSE calculation
## on the first random path
WW <- W[[sample(1:N, 1)]]
WY <- f2d(WW)

## exhaustive search via ``joint" ALC
j.exh <- laGPsep(WW, 6, 100, X, Y, method="alcopt", close=10000, lite=FALSE)
sqrt(mean((WY - j.exh$mean)^2)) ## RMSE

## repeat for all thirty path elements (way too slow for checking) and other
## local design choices and visualize RMSE distribution(s) side-by-side
## Not run:    
  ## pre-allocate to save RMSE
  rmse.exh <- rmse.opt <- rmse.nn <- rmse.pw <- rmse.pwnn <- rep(NA, N)
  for(t in 1:N){
     
    WW <- W[[t]]
    WY <- f2d(WW)
       
    ## joint local design exhaustive search via ALC
    j.exh <- laGPsep(WW, 6, 100, X, Y, method="alc", close=10000, lite=FALSE)
    rmse.exh[t] <- sqrt(mean((WY - j.exh$mean)^2))
     
    ## joint local design gradient-based search via ALC
    j.opt <- laGPsep(WW, 6, 100, X, Y, method="alcopt", close=10000, lite=FALSE)
    rmse.opt[t] <- sqrt(mean((WY - j.opt$mean)^2))
    
    ## joint local design exhaustive search via NN
    j.nn <- laGPsep(WW, 6, 100, X, Y, method="nn", close=10000, lite=FALSE)
    rmse.nn[t] <- sqrt(mean((WY - j.nn$mean)^2))
     
    ## pointwise local design via ALC
    pw <- aGPsep(X, Y, WW, start=6, end=50, d=list(max=20), method="alc", verb=0)
    rmse.pw[t] <- sqrt(mean((WY - pw$mean)^2))
     
    ## pointwise local design via NN
    pw.nn <- aGPsep(X, Y, WW, start=6, end=50, d=list(max=20), method="nn", verb=0)   
    rmse.pwnn[t] <- sqrt(mean((WY - pw.nn$mean)^2))
     
    ## progress meter
    print(t)
  }
  
  ## justify the y range
  ylim_RMSE <- log(range(rmse.exh, rmse.opt, rmse.nn, rmse.pw, rmse.pwnn))
     
  ## plot the distribution of RMSE output
  boxplot(log(rmse.exh), log(rmse.opt), log(rmse.nn), log(rmse.pw), log(rmse.pwnn),
          xaxt='n', xlab="", ylab="log(RMSE)", ylim=ylim_RMSE, main="")
  axis(1, at=1:5, labels=c("ALC-ex", "ALC-opt", "NN", "ALC-pw", "NN-pw"), las=1)

## End(Not run)
## 1. visualization of the randomly generated paths

## generate the paths
D <- 4
a <- c(-2, 2)
N <- 30
smin <- 0.1
res <- 100
line.set <- randLine(a=a, D=D, N=N, smin=smin, res=res)
  
## the indices of the randomly selected pair of input coordinates
d <- line.set$d

## visualization

## first create an empty plot
par(mar=c(5, 4, 6, 2) + 0.1)
plot(0, xlim=a, ylim=a, type="l", xlab=paste("factor ", d[1], sep=""), 
     ylab=paste("factor ", d[2], sep=""), main="2d random paths", 
     cex.lab=1.5, cex.main=2)
abline(h=(a[1]+a[2])/2, v=(a[1]+a[2])/2, lty=2)

## merge each path type together
W <- unlist(list(line.set$lin, line.set$qua, line.set$cub, line.set$ep, line.set$ln), 
  recursive=FALSE)

## calculate colors to retain 
n <- unlist(lapply(line.set, length)[-6])
cols <- rep(c("orange", "blue", "forestgreen", "magenta", "cornflowerblue"), n)

## plot randomly generated paths with a centering dot in red at the midway point
for(i in 1:N){
  lines(W[[i]][,1], W[[i]][,2], col=cols[i])
  points(W[[i]][res/2,1], W[[i]][res/2,2], col=2, pch=20)
}

## add legend
legend("top", legend=c("lin", "qua", "cub", "exp", "log"), cex=1.5, bty="n",
       xpd=TRUE, horiz=TRUE, inset=c(0, -0.085), lty=rep(1, 5), lwd=rep(1, 5),
       col=c("orange", "blue", "forestgreen", "magenta", "cornflowerblue"))

## 2. use the random paths for out-of-sample prediction via laGPsep

## test function (same 2d function as in other examples package)
## (ignoring 4d nature of path generation above)
f2d <- function(x, y=NULL){
  if(is.null(y)){
     if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
     y <- x[,2]; x <- x[,1]
  }
  g <- function(z)
  return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
  z <- -g(x)*g(y)
}
    
## generate training data using 2d input space
x <- seq(a[1], a[2], by=0.02)
X <- as.matrix(expand.grid(x, x))
Y <- f2d(X)

## example of joint path calculation folowed by RMSE calculation
## on the first random path
WW <- W[[sample(1:N, 1)]]
WY <- f2d(WW)

## exhaustive search via ``joint" ALC
j.exh <- laGPsep(WW, 6, 100, X, Y, method="alcopt", close=10000, lite=FALSE)
sqrt(mean((WY - j.exh$mean)^2)) ## RMSE

## repeat for all thirty path elements (way too slow for checking) and other
## local design choices and visualize RMSE distribution(s) side-by-side
## Not run:    
  ## pre-allocate to save RMSE
  rmse.exh <- rmse.opt <- rmse.nn <- rmse.pw <- rmse.pwnn <- rep(NA, N)
  for(t in 1:N){
     
    WW <- W[[t]]
    WY <- f2d(WW)
       
    ## joint local design exhaustive search via ALC
    j.exh <- laGPsep(WW, 6, 100, X, Y, method="alc", close=10000, lite=FALSE)
    rmse.exh[t] <- sqrt(mean((WY - j.exh$mean)^2))
     
    ## joint local design gradient-based search via ALC
    j.opt <- laGPsep(WW, 6, 100, X, Y, method="alcopt", close=10000, lite=FALSE)
    rmse.opt[t] <- sqrt(mean((WY - j.opt$mean)^2))
    
    ## joint local design exhaustive search via NN
    j.nn <- laGPsep(WW, 6, 100, X, Y, method="nn", close=10000, lite=FALSE)
    rmse.nn[t] <- sqrt(mean((WY - j.nn$mean)^2))
     
    ## pointwise local design via ALC
    pw <- aGPsep(X, Y, WW, start=6, end=50, d=list(max=20), method="alc", verb=0)
    rmse.pw[t] <- sqrt(mean((WY - pw$mean)^2))
     
    ## pointwise local design via NN
    pw.nn <- aGPsep(X, Y, WW, start=6, end=50, d=list(max=20), method="nn", verb=0)   
    rmse.pwnn[t] <- sqrt(mean((WY - pw.nn$mean)^2))
     
    ## progress meter
    print(t)
  }
  
  ## justify the y range
  ylim_RMSE <- log(range(rmse.exh, rmse.opt, rmse.nn, rmse.pw, rmse.pwnn))
     
  ## plot the distribution of RMSE output
  boxplot(log(rmse.exh), log(rmse.opt), log(rmse.nn), log(rmse.pw), log(rmse.pwnn),
          xaxt='n', xlab="", ylab="log(RMSE)", ylim=ylim_RMSE, main="")
  axis(1, at=1:5, labels=c("ALC-ex", "ALC-opt", "NN", "ALC-pw", "NN-pw"), las=1)

## End(Not run)

Package 'laGP'

Help Index

Localized Approximate GP Regression For Many Predictive Locations

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Improvement statistics for sequential or local design

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Bootstrapped block Latin hypercube subsampling

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

Examples

Generate Priors for GP correlation

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Delete C-side Gaussian Process Objects

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Estimate Discrepancy in Calibration Model

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Calculate the squared Euclidean distance between pairs of points

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Objective function for performing large scale computer model calibration via optimization

Description

Usage

Arguments

Details

Value

Note

Author(s)

References