Title: | Local Approximate Gaussian Process Regression |
---|---|
Description: | Performs approximate GP regression for large computer experiments and spatial datasets. The approximation is based on finding small local designs for prediction (independently) at particular inputs. OpenMP and SNOW parallelization are supported for prediction over a vast out-of-sample testing set; GPU acceleration is also supported for an important subroutine. OpenMP and GPU features may require special compilation. An interface to lower-level (full) GP inference and prediction is provided. Wrapper routines for blackbox optimization under mixed equality and inequality constraints via an augmented Lagrangian scheme, and for large scale computer model calibration, are also provided. For details and tutorial, see Gramacy (2016 <doi:10.18637/jss.v072.i01>. |
Authors: | Robert B. Gramacy <[email protected]>, Furong Sun <[email protected]> |
Maintainer: | Robert B. Gramacy <[email protected]> |
License: | LGPL |
Version: | 1.5-9 |
Built: | 2024-11-21 06:10:45 UTC |
Source: | https://github.com/cran/laGP |
Facilitates localized Gaussian process inference and prediction at a large
set of predictive locations, by (essentially) calling laGP
at each location, and returning the moments of the predictive
equations, and indices into the design, thus obtained
aGP(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, omp.threads = if (num.gpus > 0) 0 else 1, nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1) aGP.parallel(cls, XX, chunks = length(cls), X, Z, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, omp.threads = if (num.gpus > 0) 0 else 1, nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1) aGP.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, close = min((1000+end) *if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), laGP=laGP.R, verb = 1) aGPsep(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "nn"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), omp.threads = 1, verb = 1) aGPsep.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "nn"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), laGPsep=laGPsep.R, verb = 1) aGP.seq(X, Z, XX, d, methods=rep("alc", 2), M=NULL, ncalib=0, ...)
aGP(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, omp.threads = if (num.gpus > 0) 0 else 1, nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1) aGP.parallel(cls, XX, chunks = length(cls), X, Z, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, omp.threads = if (num.gpus > 0) 0 else 1, nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1) aGP.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, close = min((1000+end) *if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), laGP=laGP.R, verb = 1) aGPsep(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "nn"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), omp.threads = 1, verb = 1) aGPsep.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000, method = c("alc", "alcray", "nn"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), numrays = ncol(X), laGPsep=laGPsep.R, verb = 1) aGP.seq(X, Z, XX, d, methods=rep("alc", 2), M=NULL, ncalib=0, ...)
X |
a |
Z |
a vector of responses/dependent values with |
XX |
a |
start |
the number of Nearest Neighbor (NN) locations to start each
independent call to |
end |
the total size of the local designs; |
d |
a prior or initial setting for the lengthscale
parameter in a Gaussian correlation function; a (default)
|
g |
a prior or initial setting for the nugget parameter; a
|
method |
specifies the method by which |
methods |
for |
Xi.ret |
a scalar logical indicating whether or not a |
close |
a non-negative integer |
numrays |
a scalar integer indicating the number of rays for each
greedy search; only relevant when |
laGP |
applicable only to the R-version |
laGPsep |
applicable only to the R-version |
num.gpus |
applicable only to the C-version |
gpu.threads |
applicable only to the C-version |
omp.threads |
applicable only to the C-version |
nn.gpu |
a scalar non-negative integer between |
verb |
a non-negative integer specifying the verbosity level; |
cls |
a cluster object created by |
chunks |
a scalar integer indicating the number of chunks to break |
M |
an optional function taking two matrix inputs, of |
ncalib |
an integer between 1 and |
... |
other arguments passed from |
This function invokes laGP
with argument Xref
= XX[i,]
for each i=1:nrow(XX)
, building up local designs,
inferring local correlation parameters, and
obtaining predictive locations independently for each location. For
more details see laGP
.
The function aGP.R
is a prototype R-only version for
debugging and transparency purposes. It is slower than
aGP
, which is primarily in C. However it may be
useful for developing new programs that involve similar subroutines.
Note that aGP.R
may provide different output than aGP
due to differing library subroutines deployed in R and C.
The function aGP.parallel
allows aGP
to be called on segments
of the XX
matrix distributed to a cluster created by parallel.
It breaks XX
into chunks
which are sent to aGP
workers pointed to by the entries of cls
. The aGP.parallel
function
collects the outputs from each chunk before returning an object
almost identical to what would have been returned from a single aGP
call. On a single (SMP) node, this represents is a poor-man's version of
the OpenMP version described below. On multiple nodes both can be used.
If compiled with OpenMP flags, the independent calls to
laGP
will be
farmed out to threads allowing them to proceed in parallel - obtaining
nearly linear speed-ups. At this time aGP.R
does not
facilitate parallel computation, although a future version may exploit
the parallel functionality for clustered parallel execution.
If num.gpus > 0
then the ALC part of the independent
calculations performed by each thread will be offloaded to a GPU.
If both gpu.threads >= 1
and omp.threads >= 1
,
some of the ALC calculations will be done on the GPUs, and some
on the CPUs. In our own experimentation we have not found this
to lead to large speedups relative to omp.threads = 0
when
using GPUs. For more details, see Gramacy, Niemi, & Weiss (2014).
The aGP.sep
function is provided primarily for use in calibration
exercises, see Gramacy, et al. (2015). It automates a sequence of
aGP
calls, each with a potentially different method,
successively feeding the previous estimate of local lengthscale (d
)
in as an initial set of values for the next call. It also allows the
use of aGP
to be bypassed, feeding the inputs into a user-supplied
M
function instead. This feature is enabled when
methods = FALSE
. The M
function takes two matrices
(same number of rows) as inputs, where the first ncol(X) - ncalib
columns represent
“field data” inputs shared by the physical and computer model
(in the calibration context), and the remaining ncalib
are
the extra tuning or calibration parameters required to evalue the computer
model. For examples illustrating aGP.seq
please see the
documentation file for discrep.est
and demo("calib")
The output is a list
with the following components.
mean |
a vector of predictive means of length |
var |
a vector of predictive variances of length
|
llik |
a vector indicating the log likelihood/posterior
probability of the data/parameter(s) under the chosen sub-design for
each predictive location in |
time |
a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation |
method |
a copy of the |
d |
a full-list version of the |
g |
a full-list version of the |
mle |
if |
Xi |
when |
close |
a copy of the input argument |
The aGP.seq
function only returns the output from the final aGP
call.
When methods = FALSE
and M
is supplied, the returned object
is a data frame with a mean
column indicating the output of the computer
model run, and a var
column, which at this time is zero
aGPsep
provides the same functionality as aGP
but deploys
a separable covariance function. Criteria (method
s) EFI and
MSPE are not supported by aGPsep
at this time.
Note that using method="NN"
gives the same result as specifying
start=end
, however at some extra computational expense.
At this time, this function provides no facility to find local designs
for the subset of predictive locations XX
jointly, i.e.,
providing a matrix Xref
to laGP
. See laGP
for more details/support for this alternative.
The use of OpenMP threads with aGPsep
is not as efficient as with
aGP
when calculating MLEs with respect to the lengthscale (i.e.,
d=list(mle=TRUE, ...)
). The reason is that the lbfgsb
C
entry point uses static variables, and is therefore not thread safe.
To circumvent this problem, an OpenMP critical
pragma is used,
which can create a small bottle neck
Robert B. Gramacy [email protected]
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 9.) https://bobby.gramacy.com/surrogates/
R.B. Gramacy (2016). laGP: Large-Scale Spatial Modeling via
Local Approximate Gaussian Processes in R., Journal of Statistical
Software, 72(1), 1-46; doi:10.18637/jss.v072.i01
or see vignette("laGP")
R.B. Gramacy and D.W. Apley (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2), pp. 561-678; preprint on arXiv:1303.0383; https://arxiv.org/abs/1303.0383
R.B. Gramacy, J. Niemi, R.M. Weiss (2014). Massively parallel approximate Gaussian process regression. SIAM/ASA Journal on Uncertainty Quantification, 2(1), pp. 568-584; preprint on arXiv:1310.5182; https://arxiv.org/abs/1310.5182
R.B. Gramacy and B. Haaland (2016). Speeding up neighborhood search in local Gaussian process prediction. Technometrics, 58(3), pp. 294-303; preprint on arXiv:1409.0074 https://arxiv.org/abs/1409.0074
vignette("laGP")
,
laGP
, alcGP
, mspeGP
, alcrayGP
,
makeCluster
, clusterApply
## first, a "computer experiment" ## Simple 2-d test function used in Gramacy & Apley (2014); ## thanks to Lee, Gramacy, Taddy, and others who have used it before f2d <- function(x, y=NULL) { if(is.null(y)){ if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## build up a design with N=~40K locations x <- seq(-2, 2, by=0.02) X <- expand.grid(x, x) Z <- f2d(X) ## predictive grid with NN=400 locations, ## change NN to 10K (length=100) to mimic setup in Gramacy & Apley (2014) ## the low NN set here is for fast CRAN checks xx <- seq(-1.975, 1.975, length=10) XX <- expand.grid(xx, xx) ZZ <- f2d(XX) ## get the predictive equations, first based on Nearest Neighbor out <- aGP(X, Z, XX, method="nn", verb=0) ## RMSE sqrt(mean((out$mean - ZZ)^2)) ## Not run: ## refine with ALC out2 <- aGP(X, Z, XX, method="alc", d=out$mle$d) ## RMSE sqrt(mean((out2$mean - ZZ)^2)) ## visualize the results par(mfrow=c(1,3)) image(xx, xx, matrix(out2$mean, nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="predictive mean") image(xx, xx, matrix(out2$mean-ZZ, nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="bias") image(xx, xx, matrix(sqrt(out2$var), nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="sd") ## refine with MSPE out3 <- aGP(X, Z, XX, method="mspe", d=out2$mle$d) ## RMSE sqrt(mean((out3$mean - ZZ)^2)) ## End(Not run) ## version with ALC-ray which is much faster than the ones not ## run above out2r <- aGP(X, Z, XX, method="alcray", d=out$mle$d, verb=0) sqrt(mean((out2r$mean - ZZ)^2)) ## a simple example with estimated nugget if(require("MASS")) { ## motorcycle data and predictive locations X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] XX <- matrix(seq(min(X), max(X), length=100), ncol=1) ## first stage out <- aGP(X=X, Z=Z, XX=XX, end=30, g=list(mle=TRUE), verb=0) ## plot smoothed versions of the estimated parameters par(mfrow=c(2,1)) df <- data.frame(y=log(out$mle$d), XX) lo <- loess(y~., data=df, span=0.25) plot(XX, log(out$mle$d), type="l") lines(XX, lo$fitted, col=2) dfnug <- data.frame(y=log(out$mle$g), XX) lonug <- loess(y~., data=dfnug, span=0.25) plot(XX, log(out$mle$g), type="l") lines(XX, lonug$fitted, col=2) ## second stage design out2 <- aGP(X=X, Z=Z, XX=XX, end=30, verb=0, d=list(start=exp(lo$fitted), mle=FALSE), g=list(start=exp(lonug$fitted))) ## plot the estimated surface par(mfrow=c(1,1)) plot(X,Z) df <- 20 s2 <- out2$var*(df-2)/df q1 <- qt(0.05, df)*sqrt(s2) + out2$mean q2 <- qt(0.95, df)*sqrt(s2) + out2$mean lines(XX, out2$mean) lines(XX, q1, col=1, lty=2) lines(XX, q2, col=1, lty=2) } ## compare to the single-GP result provided in the mleGP documentation
## first, a "computer experiment" ## Simple 2-d test function used in Gramacy & Apley (2014); ## thanks to Lee, Gramacy, Taddy, and others who have used it before f2d <- function(x, y=NULL) { if(is.null(y)){ if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## build up a design with N=~40K locations x <- seq(-2, 2, by=0.02) X <- expand.grid(x, x) Z <- f2d(X) ## predictive grid with NN=400 locations, ## change NN to 10K (length=100) to mimic setup in Gramacy & Apley (2014) ## the low NN set here is for fast CRAN checks xx <- seq(-1.975, 1.975, length=10) XX <- expand.grid(xx, xx) ZZ <- f2d(XX) ## get the predictive equations, first based on Nearest Neighbor out <- aGP(X, Z, XX, method="nn", verb=0) ## RMSE sqrt(mean((out$mean - ZZ)^2)) ## Not run: ## refine with ALC out2 <- aGP(X, Z, XX, method="alc", d=out$mle$d) ## RMSE sqrt(mean((out2$mean - ZZ)^2)) ## visualize the results par(mfrow=c(1,3)) image(xx, xx, matrix(out2$mean, nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="predictive mean") image(xx, xx, matrix(out2$mean-ZZ, nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="bias") image(xx, xx, matrix(sqrt(out2$var), nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="sd") ## refine with MSPE out3 <- aGP(X, Z, XX, method="mspe", d=out2$mle$d) ## RMSE sqrt(mean((out3$mean - ZZ)^2)) ## End(Not run) ## version with ALC-ray which is much faster than the ones not ## run above out2r <- aGP(X, Z, XX, method="alcray", d=out$mle$d, verb=0) sqrt(mean((out2r$mean - ZZ)^2)) ## a simple example with estimated nugget if(require("MASS")) { ## motorcycle data and predictive locations X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] XX <- matrix(seq(min(X), max(X), length=100), ncol=1) ## first stage out <- aGP(X=X, Z=Z, XX=XX, end=30, g=list(mle=TRUE), verb=0) ## plot smoothed versions of the estimated parameters par(mfrow=c(2,1)) df <- data.frame(y=log(out$mle$d), XX) lo <- loess(y~., data=df, span=0.25) plot(XX, log(out$mle$d), type="l") lines(XX, lo$fitted, col=2) dfnug <- data.frame(y=log(out$mle$g), XX) lonug <- loess(y~., data=dfnug, span=0.25) plot(XX, log(out$mle$g), type="l") lines(XX, lonug$fitted, col=2) ## second stage design out2 <- aGP(X=X, Z=Z, XX=XX, end=30, verb=0, d=list(start=exp(lo$fitted), mle=FALSE), g=list(start=exp(lonug$fitted))) ## plot the estimated surface par(mfrow=c(1,1)) plot(X,Z) df <- 20 s2 <- out2$var*(df-2)/df q1 <- qt(0.05, df)*sqrt(s2) + out2$mean q2 <- qt(0.95, df)*sqrt(s2) + out2$mean lines(XX, out2$mean) lines(XX, q1, col=1, lty=2) lines(XX, q2, col=1, lty=2) } ## compare to the single-GP result provided in the mleGP documentation
Calculate the active learning Cohn (ALC) statistic, mean-squared predictive error (MSPE) or expected Fisher information (fish) for a Gaussian process (GP) predictor relative to a set of reference locations, towards sequential design or local search for Gaussian process regression
alcGP(gpi, Xcand, Xref = Xcand, parallel = c("none", "omp", "gpu"), verb = 0) alcGPsep(gpsepi, Xcand, Xref = Xcand, parallel = c("none", "omp", "gpu"), verb = 0) alcrayGP(gpi, Xref, Xstart, Xend, verb = 0) alcrayGPsep(gpsepi, Xref, Xstart, Xend, verb = 0) ieciGP(gpi, Xcand, fmin, Xref = Xcand, w = NULL, nonug = FALSE, verb = 0) ieciGPsep(gpsepi, Xcand, fmin, Xref = Xcand, w = NULL, nonug = FALSE, verb = 0) mspeGP(gpi, Xcand, Xref = Xcand, fi = TRUE, verb = 0) fishGP(gpi, Xcand) alcoptGP(gpi, Xref, start, lower, upper, maxit = 100, verb = 0) alcoptGPsep(gpsepi, Xref, start, lower, upper, maxit = 100, verb = 0) dalcGP(gpi, Xcand, Xref = Xcand, verb = 0) dalcGPsep(gpsepi, Xcand, Xref = Xcand, verb = 0)
alcGP(gpi, Xcand, Xref = Xcand, parallel = c("none", "omp", "gpu"), verb = 0) alcGPsep(gpsepi, Xcand, Xref = Xcand, parallel = c("none", "omp", "gpu"), verb = 0) alcrayGP(gpi, Xref, Xstart, Xend, verb = 0) alcrayGPsep(gpsepi, Xref, Xstart, Xend, verb = 0) ieciGP(gpi, Xcand, fmin, Xref = Xcand, w = NULL, nonug = FALSE, verb = 0) ieciGPsep(gpsepi, Xcand, fmin, Xref = Xcand, w = NULL, nonug = FALSE, verb = 0) mspeGP(gpi, Xcand, Xref = Xcand, fi = TRUE, verb = 0) fishGP(gpi, Xcand) alcoptGP(gpi, Xref, start, lower, upper, maxit = 100, verb = 0) alcoptGPsep(gpsepi, Xref, start, lower, upper, maxit = 100, verb = 0) dalcGP(gpi, Xcand, Xref = Xcand, verb = 0) dalcGPsep(gpsepi, Xcand, Xref = Xcand, verb = 0)
gpi |
a C-side GP object identifier (positive integer);
e.g., as returned by |
gpsepi |
a C-side separable GP object identifier (positive integer);
e.g., as returned by |
Xcand |
a |
fmin |
for |
Xref |
a |
parallel |
a switch indicating if any parallel calculation of
the criteria ( |
Xstart |
a |
Xend |
a |
fi |
a scalar logical indicating if the expected Fisher information portion
of the expression (MSPE is essentially |
w |
weights on the reference locations |
nonug |
a scalar logical indicating if a (nonzero) nugget should be used in the predictive
equations behind IECI calculations; this allows the user to toggle improvement via predictive
mean uncertainty versus full predictive uncertainty. The latter (default |
verb |
a non-negative integer specifying the verbosity level; |
start |
initial values to the derivative-based search via |
lower , upper
|
bounds on the derivative-based search via |
maxit |
the maximum number of iterations (default |
The best way to see how these functions are used in the context of local
approximation is to inspect the code in the laGP.R
function.
Otherwise they are pretty self-explanatory. They evaluate the ALC, MSPE, and EFI quantities outlined in Gramacy & Apley (2015). ALC is originally due to Seo, et al. (2000). The ray-based search is described by Gramacy & Haaland (2015).
MSPE and EFI calculations are not supported for separable GP models, i.e.,
there are no mspeGPsep
or fishGPsep
functions.
alcrayGP
and alcrayGPsep
allow only one reference location
(nrow(Xref) = 1
). alcoptGP
and alcoptGPsep
allow multiple
reference locations. These optimize a continuous ALC analog in its natural logarithm
using the starting locations, bounding boxes and (stored) GP provided by gpi
or gpisep
,
and finally snaps the solution back to the candidate grid. For details, see
Sun, et al. (2017).
Note that ieciGP
and ieciGPsep
, which are for optimization via
integrated expected conditional improvement (Gramacy & Lee, 2011) are
“alpha” functionality and are not fully documented at this time.
Except for alcoptGP
, alcoptGPsep
, dalcGP
, and dalcGPsep
, a vector of length nrow(Xcand)
is returned
filled with values corresponding to the desired statistic
par |
the best set of parameters/input configuration found on optimization |
value |
the optimized objective value corresponding to output |
its |
a two-element integer vector giving the number of calls to the object function and the gradient respectively. |
msg |
a character string giving any additional information returned by the optimizer, or |
convergence |
An integer code. |
alcs |
reduced predictive variance averaged over the reference locations |
dalcs |
the derivative of |
Robert B. Gramacy [email protected] and Furong Sun [email protected]
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 9.) https://bobby.gramacy.com/surrogates/
F. Sun, R.B. Gramacy, B. Haaland, E. Lawrence, and A. Walker (2019). Emulating satellite drag from large simulation experiments, SIAM/ASA Journal on Uncertainty Quantification, 7(2), pp. 720-759; preprint on arXiv:1712.00182; https://arxiv.org/abs/1712.00182
R.B. Gramacy (2016). laGP: Large-Scale Spatial Modeling via
Local Approximate Gaussian Processes in R, Journal of Statistical
Software, 72(1), 1-46; doi:10.18637/jss.v072.i01
or see vignette("laGP")
R.B. Gramacy and B. Haaland (2016). Speeding up neighborhood search in local Gaussian process prediction, Technometrics, 58(3), pp. 294-303; preprint on arXiv:1409.0074; https://arxiv.org/abs/1409.0074
R.B. Gramacy and D.W. Apley (2015). Local Gaussian process approximation for large computer experiments, Journal of Computational and Graphical Statistics, 24(2), pp. 561-678; preprint on arXiv:1303.0383; https://arxiv.org/abs/1303.0383
R.B. Gramacy, J. Niemi, R.M. Weiss (2014). Massively parallel approximate Gaussian process regression, SIAM/ASA Journal on Uncertainty Quantification, 2(1), pp. 568-584; preprint on arXiv:1310.5182; https://arxiv.org/abs/1310.5182
R.B. Gramacy, H.K.H. Lee (2011). Optimization under unknown constraints, Valencia discussion paper, in Bayesian Statistics 9. Oxford University Press; preprint on arXiv:1004.4027; https://arxiv.org/abs/1004.4027
S. Seo, M., Wallat, T. Graepel, K. Obermayer (2000). Gaussian Process Regression: Active Data Selection and Test Point Rejection, In Proceedings of the International Joint Conference on Neural Networks, vol. III, 241-246. IEEE
## this follows the example in predGP, but only evaluates ## information statistics documented here ## Simple 2-d test function used in Gramacy & Apley (2015); ## thanks to Lee, Gramacy, Taddy, and others who have used it before f2d <- function(x, y=NULL) { if(is.null(y)) { if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## design with N=441 x <- seq(-2, 2, length=11) X <- expand.grid(x, x) Z <- f2d(X) ## fit a GP gpi <- newGP(X, Z, d=0.35, g=1/1000, dK=TRUE) ## predictive grid with NN=400 xx <- seq(-1.9, 1.9, length=20) XX <- expand.grid(xx, xx) ## predict alc <- alcGP(gpi, XX) mspe <- mspeGP(gpi, XX) fish <- fishGP(gpi, XX) ## visualize the result par(mfrow=c(1,3)) image(xx, xx, matrix(sqrt(alc), nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="sqrt ALC") image(xx, xx, matrix(sqrt(mspe), nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="sqrt MSPE") image(xx, xx, matrix(log(fish), nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="log fish") ## clean up deleteGP(gpi) ## ## Illustrating some of the other functions in a sequential design context, ## using X and XX above ## ## new, much bigger design x <- seq(-2, 2, by=0.02) X <- expand.grid(x, x) Z <- f2d(X) ## first build a local design of size 25, see laGP documentation out <- laGP.R(XX, start=6, end=25, X, Z, method="alc", close=10000) ## extract that design and fit GP XC <- X[out$Xi,] ## inputs ZC <- Z[out$Xi] ## outputs gpi <- newGP(XC, ZC, d=out$mle$d, g=out$g$start) ## calculate the ideal "next" location via continuous ALC optimization alco <- alcoptGP(gpi=gpi, Xref=XX, start=c(0,0), lower=range(x)[1], upper=range(x)[2]) ## alco$par is the "new" location; calculate distances between candidates (remaining ## unchosen X locations) and this solution Xcan <- X[-out$Xi,] D <- distance(Xcan, matrix(alco$par, ncol=ncol(Xcan))) ## snap the new location back to the candidate set lab <- which.min(D) xnew <- Xcan[lab,] ## add xnew to the local design, remove it from Xcan, and repeat ## evaluate the derivative at this new location dalc <- dalcGP(gpi=gpi, Xcand=matrix(xnew, nrow=1), Xref=XX) ## clean up deleteGP(gpi)
## this follows the example in predGP, but only evaluates ## information statistics documented here ## Simple 2-d test function used in Gramacy & Apley (2015); ## thanks to Lee, Gramacy, Taddy, and others who have used it before f2d <- function(x, y=NULL) { if(is.null(y)) { if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## design with N=441 x <- seq(-2, 2, length=11) X <- expand.grid(x, x) Z <- f2d(X) ## fit a GP gpi <- newGP(X, Z, d=0.35, g=1/1000, dK=TRUE) ## predictive grid with NN=400 xx <- seq(-1.9, 1.9, length=20) XX <- expand.grid(xx, xx) ## predict alc <- alcGP(gpi, XX) mspe <- mspeGP(gpi, XX) fish <- fishGP(gpi, XX) ## visualize the result par(mfrow=c(1,3)) image(xx, xx, matrix(sqrt(alc), nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="sqrt ALC") image(xx, xx, matrix(sqrt(mspe), nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="sqrt MSPE") image(xx, xx, matrix(log(fish), nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="log fish") ## clean up deleteGP(gpi) ## ## Illustrating some of the other functions in a sequential design context, ## using X and XX above ## ## new, much bigger design x <- seq(-2, 2, by=0.02) X <- expand.grid(x, x) Z <- f2d(X) ## first build a local design of size 25, see laGP documentation out <- laGP.R(XX, start=6, end=25, X, Z, method="alc", close=10000) ## extract that design and fit GP XC <- X[out$Xi,] ## inputs ZC <- Z[out$Xi] ## outputs gpi <- newGP(XC, ZC, d=out$mle$d, g=out$g$start) ## calculate the ideal "next" location via continuous ALC optimization alco <- alcoptGP(gpi=gpi, Xref=XX, start=c(0,0), lower=range(x)[1], upper=range(x)[2]) ## alco$par is the "new" location; calculate distances between candidates (remaining ## unchosen X locations) and this solution Xcan <- X[-out$Xi,] D <- distance(Xcan, matrix(alco$par, ncol=ncol(Xcan))) ## snap the new location back to the candidate set lab <- which.min(D) xnew <- Xcan[lab,] ## add xnew to the local design, remove it from Xcan, and repeat ## evaluate the derivative at this new location dalc <- dalcGP(gpi=gpi, Xcand=matrix(xnew, nrow=1), Xref=XX) ## clean up deleteGP(gpi)
Provides bootstrapped block Latin hypercube subsampling under a given data set to aid in consistent estimation of a global separable lengthscale parameter
blhs(y, X, m) blhs.loop(y, X, m, K, da, g = 1e-3, maxit = 100, verb = 0, plot.it = FALSE)
blhs(y, X, m) blhs.loop(y, X, m, K, da, g = 1e-3, maxit = 100, verb = 0, plot.it = FALSE)
y |
a vector of responses/dependent values with |
X |
a |
m |
a positive scalar integer giving the number of divisions on each coordinate of input space defining the block structure |
K |
a positive scalar integer specifying the number of Bootstrap replicates desired |
da |
a lengthscale prior, say as generated by |
g |
a positive scalar giving the fixed nugget value of the nugget parameter; by default |
maxit |
a positive scalar integer giving the maximum number of iterations for MLE calculations via |
verb |
a non-negative integer specifying the verbosity level; |
plot.it |
|
Bootstrapped block Latin hypercube subsampling (BLHS) yields a global lengthscale estimator
which is asymptotically consistent with the MLE calculated on the full data set. However, since it
works on data subsets, it comes at a much reduced computational cost. Intuitively, the BLHS
guarantees a good mix of short and long pairwise distances. A single bootstrap LH subsample
may be obtained by dividing each dimension of the input space equally into m
intervals, yielding mutually exclusive hypercubes. It is easy to show
that the average number of observations in each hypercube is
if there are
samples in the original design. From each of these hypercubes,
m
blocks
are randomly selected following the LH paradigm, i.e., so that
only one interval is chosen from each of the m
segments. The average number of
observations in the subsample, combining the m
randomly selected blocks,
is .
Ensuring a subsample size of at least one
requires having ,
thereby linking the parameter
m
to computational effort. Smaller m
is preferred so long
as GP inference on data of that size remains tractable. Since the blocks follow
an LH structure, the resulting sub-design inherits the usual LHS properties,
e.g., retaining marginal properties like univariate stratification modulo features present in the original,
large N
, design.
For more details, see Liu (2014), Zhao, et al. (2017) and Sun, et al. (2019).
blhs
returns the subsampled input space and the corresponding responses.
blhs.loop
returns the median of the K
lengthscale maximum likelihood estimates, the subsampled data size to which
that corresponds, and the subsampled data, including the input space and the responses, from the bootstrap iterations
blhs
returns
xs |
the subsampled input space |
ys |
the subsampled responses, |
blhs.loop
returns
that |
the lengthscale estimate (median), |
ly |
the subsampled data size (median) |
xm |
the subsampled input space (median) |
ym |
the subsampled responses (median) |
This implementation assums that X
has been coded to the unit cube (),
where
p = ncol(X)
.
X
should be relatively homogeneous. A space-filling design (input) X
is ideal, but not required
Robert B. Gramacy [email protected] and Furong Sun [email protected]
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 9.) https://bobby.gramacy.com/surrogates/
F. Sun, R.B. Gramacy, B. Haaland, E. Lawrence, and A. Walker (2019). Emulating satellite drag from large simulation experiments, SIAM/ASA Journal on Uncertainty Quantification, 7(2), pp. 720-759; preprint on arXiv:1712.00182; https://arxiv.org/abs/1712.00182
Y. Zhao, Y. Hung, and Y. Amemiya (2017). Efficient Gaussian Process Modeling using Experimental Design-Based Subagging, Statistica Sinica, to appear;
Yufan Liu (2014) Recent Advances in Computer Experiment Modeling. Ph.D. Thesis at Rutgers, The State University of New Jersey. https://dx.doi.org/doi:10.7282/T38G8J1H
# input space based on latin-hypercube sampling (not required) # two dimensional example with N=216 sized sample if(require(lhs)) { X <- randomLHS(216, 2) } else { X <- matrix(runif(216*2), ncol=2) } # pseudo responses, not important for visualizing design Y <- runif(216) ## BLHS sample with m=6 divisions in each coordinate sub <- blhs(y=Y, X=X, m=6) Xsub <- sub$xs # the bootstrapped subsample # visualization plot(X, xaxt="n", yaxt="n", xlim=c(0,1), ylim=c(0,1), xlab="factor 1", ylab="factor 2", col="cyan", main="BLHS") b <- seq(0, 1, by=1/6) abline(h=b, v=b, col="black", lty=2) axis(1, at=seq (0, 1, by=1/6), cex.axis=0.8, labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1)) axis(2, at=seq (0, 1, by=1/6), cex.axis=0.8, labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1), las=1) points(Xsub, col="red", pch=19, cex=1.25) ## Comparing global lengthscale MLE based on BLHS and random subsampling ## Not run: # famous borehole function borehole <- function(x){ rw <- x[1] * (0.15 - 0.05) + 0.05 r <- x[2] * (50000 - 100) + 100 Tu <- x[3] * (115600 - 63070) + 63070 Tl <- x[5] * (116 - 63.1) + 63.1 Hu <- x[4] * (1110 - 990) + 990 Hl <- x[6] * (820 - 700) + 700 L <- x[7] * (1680 - 1120) + 1120 Kw <- x[8] * (12045 - 9855) + 9855 m1 <- 2 * pi * Tu * (Hu - Hl) m2 <- log(r / rw) m3 <- 1 + 2*L*Tu/(m2*rw^2*Kw) + Tu/Tl return(m1/m2/m3) } N <- 100000 # number of observations if(require(lhs)) { xt <- randomLHS(N, 8) # input space } else { xt <- matrix(runif(N*8), ncol=8) } yt <- apply(xt, 1, borehole) # response colnames(xt) <- c("rw", "r", "Tu", "Tl", "Hu", "Hl", "L", "Kw") ## prior on the GP lengthscale parameter da <- darg(list(mle=TRUE, max=100), xt) ## make space for two sets of boxplots par(mfrow=c(1,2)) # BLHS calculating with visualization of the K MLE lengthscale estimates K <- 10 # number of Bootstrap samples; Sun, et al (2017) uses K <- 31 sub_blhs <- blhs.loop(y=yt, X=xt, K=K, m=2, da=da, maxit=200, plot.it=TRUE) # a random subsampling analog for comparison sn <- sub_blhs$ly # extract a size that is consistent with the BLHS that.rand <- matrix(NA, ncol=8, nrow=K) for(i in 1:K){ sub <- sample(1:nrow(xt), sn) gpsepi <- newGPsep(xt[sub,], yt[sub], d=da$start, g=1e-3, dK=TRUE) mle <- mleGPsep(gpsepi, tmin=da$min, tmax=10*da$max, ab=da$ab, maxit=200) deleteGPsep(gpsepi) that.rand[i,] <- mle$d } ## put random boxplots next to BLHS ones boxplot(that.rand, xlab="input", ylab="theta-hat", col=2, main="random subsampling") ## End(Not run)
# input space based on latin-hypercube sampling (not required) # two dimensional example with N=216 sized sample if(require(lhs)) { X <- randomLHS(216, 2) } else { X <- matrix(runif(216*2), ncol=2) } # pseudo responses, not important for visualizing design Y <- runif(216) ## BLHS sample with m=6 divisions in each coordinate sub <- blhs(y=Y, X=X, m=6) Xsub <- sub$xs # the bootstrapped subsample # visualization plot(X, xaxt="n", yaxt="n", xlim=c(0,1), ylim=c(0,1), xlab="factor 1", ylab="factor 2", col="cyan", main="BLHS") b <- seq(0, 1, by=1/6) abline(h=b, v=b, col="black", lty=2) axis(1, at=seq (0, 1, by=1/6), cex.axis=0.8, labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1)) axis(2, at=seq (0, 1, by=1/6), cex.axis=0.8, labels=expression(0, 1/6, 2/6, 3/6, 4/6, 5/6, 1), las=1) points(Xsub, col="red", pch=19, cex=1.25) ## Comparing global lengthscale MLE based on BLHS and random subsampling ## Not run: # famous borehole function borehole <- function(x){ rw <- x[1] * (0.15 - 0.05) + 0.05 r <- x[2] * (50000 - 100) + 100 Tu <- x[3] * (115600 - 63070) + 63070 Tl <- x[5] * (116 - 63.1) + 63.1 Hu <- x[4] * (1110 - 990) + 990 Hl <- x[6] * (820 - 700) + 700 L <- x[7] * (1680 - 1120) + 1120 Kw <- x[8] * (12045 - 9855) + 9855 m1 <- 2 * pi * Tu * (Hu - Hl) m2 <- log(r / rw) m3 <- 1 + 2*L*Tu/(m2*rw^2*Kw) + Tu/Tl return(m1/m2/m3) } N <- 100000 # number of observations if(require(lhs)) { xt <- randomLHS(N, 8) # input space } else { xt <- matrix(runif(N*8), ncol=8) } yt <- apply(xt, 1, borehole) # response colnames(xt) <- c("rw", "r", "Tu", "Tl", "Hu", "Hl", "L", "Kw") ## prior on the GP lengthscale parameter da <- darg(list(mle=TRUE, max=100), xt) ## make space for two sets of boxplots par(mfrow=c(1,2)) # BLHS calculating with visualization of the K MLE lengthscale estimates K <- 10 # number of Bootstrap samples; Sun, et al (2017) uses K <- 31 sub_blhs <- blhs.loop(y=yt, X=xt, K=K, m=2, da=da, maxit=200, plot.it=TRUE) # a random subsampling analog for comparison sn <- sub_blhs$ly # extract a size that is consistent with the BLHS that.rand <- matrix(NA, ncol=8, nrow=K) for(i in 1:K){ sub <- sample(1:nrow(xt), sn) gpsepi <- newGPsep(xt[sub,], yt[sub], d=da$start, g=1e-3, dK=TRUE) mle <- mleGPsep(gpsepi, tmin=da$min, tmax=10*da$max, ab=da$ab, maxit=200) deleteGPsep(gpsepi) that.rand[i,] <- mle$d } ## put random boxplots next to BLHS ones boxplot(that.rand, xlab="input", ylab="theta-hat", col=2, main="random subsampling") ## End(Not run)
Generate empirical Bayes regularization (priors) and choose initial values and ranges for (isotropic) lengthscale and nugget parameters to a Gaussian correlation function for a GP regression model
darg(d, X, samp.size = 1000) garg(g, y)
darg(d, X, samp.size = 1000) garg(g, y)
d |
can be |
g |
can be |
X |
a |
y |
a vector of responses/dependent values |
samp.size |
a scalar integer indicating a subset size of |
These functions use aspects of the data, either X
or y
,
to form weakly informative default priors and choose initial values
for a lengthscale and nugget parameter. This is useful since the
likelihood can sometimes be very flat, and even with proper priors
inference can be sensitive to the specification of those priors
and any initial search values. The focus here is on avoiding pathologies
while otherwise remaining true to the spirit of MLE calculation.
darg
output specifies MLE inference (out$mle = TRUE
)
by default, whereas garg
instead fixes the nugget at the starting value,
which may be sensible for emulating deterministic computer simulation data;
when out$mle = FALSE
the calculated range outputs c(out$min, out$max)
are set to dummy values that are ignored in other parts of the laGP package.
darg
calculates a Gaussian distance matrix between all pairs of
X
rows, or a subsample of rows of size samp.size
. From
those distances it chooses the range and start values from the range
of (non-zero) distances and the 0.1
quantile, respectively.
The Gamma prior values have a shape of out$a = 3/2
and a rate
out$b
chosen by the incomplete Gamma inverse function to put
0.95
probability below out$max
.
garg
is similar except that it works with (y- mean(y))^2
instead of the pairwise distances of darg
. The only difference
is that the starting value is chosen as the 2.5% quantile.
Both functions return a list containing the following entries. If the
input object (d
or g
) specifies one of the values then
that value is copied to the same list entry on output. See the
Details section for how these values are calculated
mle |
by default, |
start |
starting value chosen from the quantiles of
|
min |
minimum value in allowable range for the parameter - for future inference purposes |
max |
maximum value in allowable range for the parameter - for future inference purposes |
ab |
shape and rate parameters specifying a Gamma prior for the parameter |
Robert B. Gramacy [email protected]
vignette("laGP")
,
laGP
, aGP
,
mleGP
, distance
, llikGP
## motorcycle data if(require("MASS")) { X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] ## get darg and garg darg(NULL, X) garg(list(mle=TRUE), Z) }
## motorcycle data if(require("MASS")) { X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] ## get darg and garg darg(NULL, X) garg(list(mle=TRUE), Z) }
Frees memory allocated by a particular C-side Gaussian process object, or all GP objects currently allocated
deleteGP(gpi) deleteGPsep(gpsepi) deleteGPs() deleteGPseps()
deleteGP(gpi) deleteGPsep(gpsepi) deleteGPs() deleteGPseps()
gpi |
a scalar positive integer specifying an allocated isotropic GP object |
gpsepi |
similar to |
Any function calling newGP
or newGPsep
will require destruction
via these functions or there will be a memory leak
Nothing is returned
Robert B. Gramacy [email protected]
vignette("laGP")
,
newGP
, predGP
, mleGP
## see examples for newGP, predGP, or mleGP
## see examples for newGP, predGP, or mleGP
Estimates the Gaussian process discrepancy/bias and/or noise term in a modularized calibration of a computer model (emulator) to field data, and returns the log likelihood or posterior probability
discrep.est(X, Y, Yhat, d, g, bias = TRUE, clean = TRUE)
discrep.est(X, Y, Yhat, d, g, bias = TRUE, clean = TRUE)
X |
a |
Y |
a vector of values with |
Yhat |
a vector with |
d |
a prior or initial setting for the (single/isotropic) lengthscale
parameter in a Gaussian correlation function; a (default)
|
g |
a prior or initial setting for the nugget parameter; a
|
bias |
a scalar logical indicating if a (isotropic)
GP discrepancy should be estimated ( |
clean |
a scalar logical indicating if the C-side GP object should be freed before returning. |
Estimates an isotropic Gaussian correlation Gaussian process (GP) discrepancy
term for the difference between a computer model output (Yhat
) and
field data observations (Y
) at locations X
. The computer model
predictions would typically come from a GP emulation from simulation data,
possibly via aGP
if the computer experiment is large.
This function is used primarily as a subroutine by fcalib
which
defines an objective function for optimization in order to solve the
calibration problem via the method described by Gramacy, et al. (2015),
designed for large computer experiments. However, once calibration is
performed this function can be useful for making comparisons to other methods.
Examples are provided in the fcalib
documentation.
When bias=FALSE
no discrepancy is estimated; only a zero-mean
Gaussian error distribution is assumed
The output object is comprised of the output of jmleGP
, applied to a GP
object built with responses Y - Yhat
. That object is augmented with
a log likelihood, in $ll
, and with a GP index $gpi
when
clean=FALSE
. When bias = FALSE
the output object retains the
same form as above, except with dummy zero-values since calling jmleGP
is not
required
Note that in principle a separable correlation function could be used
(e.g, via newGPsep
and mleGPsep
), however this is not implemented at this time
Robert B. Gramacy [email protected]
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 8.) https://bobby.gramacy.com/surrogates/
R.B. Gramacy (2016). laGP: Large-Scale Spatial Modeling via
Local Approximate Gaussian Processes in R., Journal of Statistical
Software, 72(1), 1-46; doi:10.18637/jss.v072.i01
or see vignette("laGP")
R.B. Gramacy, D. Bingham, JP. Holloway, M.J. Grosskopf, C.C. Kuranz, E. Rutter, M. Trantham, P.R. Drake (2015). Calibrating a large computer experiment simulating radiative shock hydrodynamics. Annals of Applied Statistics, 9(3) 1141-1168; preprint on arXiv:1410.3293 https://arxiv.org/abs/1410.3293
F. Liu, M. Bayarri and J. Berger (2009). Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Analysis, 4(1) 119-150.
vignette("laGP")
,
jmleGP
, newGP
, aGP.seq
, fcalib
## the example here combines aGP.seq and discrep.est functions; ## it is comprised of snippets from demo("calib"), which contains ## code from the Calibration Section of vignette("laGP") ## Here we generate calibration data using a true calibration ## parameter, u, and then evaluate log posterior probabilities ## and out-of-sample RMSEs for that u value; the fcalib ## documentation repeats this with a single call to fcalib rather ## than first aGP.seq and then discrep.est ## begin data-generation code identical to aGP.seq, discrep.est, fcalib ## example sections and demo("calib") ## M: computer model test functon used in Goh et al, 2013 (Technometrics) ## an elaboration of one from Bastos and O'Hagan, 2009 (Technometrics) M <- function(x,u) { x <- as.matrix(x) u <- as.matrix(u) out <- (1-exp(-1/(2*x[,2]))) out <- out * (1000*u[,1]*x[,1]^3+1900*x[,1]^2+2092*x[,1]+60) out <- out / (100*u[,2]*x[,1]^3+500*x[,1]^2+4*x[,1]+20) return(out) } ## bias: discrepancy function from Goh et al, 2013 bias <- function(x) { x<-as.matrix(x) out<- 2*(10*x[,1]^2+4*x[,2]^2) / (50*x[,1]*x[,2]+10) return(out) } ## beta.prior: marginal beta prior for u, ## defaults to a mode at 1/2 in hypercube beta.prior <- function(u, a=2, b=2, log=TRUE) { if(length(a) == 1) a <- rep(a, length(u)) else if(length(a) != length(u)) stop("length(a) must be 1 or length(u)") if(length(b) == 1) b <- rep(b, length(u)) else if(length(b) != length(u)) stop("length(b) must be 1 or length(u)") if(log) return(sum(dbeta(u, a, b, log=TRUE))) else return(prod(dbeta(u, a, b, log=FALSE))) } ## tgp for LHS sampling library(tgp) rect <- matrix(rep(0:1, 4), ncol=2, byrow=2) ## training and testing inputs ny <- 50; nny <- 1000 X <- lhs(ny, rect[1:2,]) ## computer model train XX <- lhs(nny, rect[1:2,],) ## test ## true (but unknown) setting of the calibration parameter ## for the computer model u <- c(0.2, 0.1) Zu <- M(X, matrix(u, nrow=1)) ZZu <- M(XX, matrix(u, nrow=1)) ## field data response, biased and replicated sd <- 0.5 ## Y <- computer output + bias + noise reps <- 2 ## example from paper uses reps <- 10 Y <- rep(Zu,reps) + rep(bias(X),reps) + rnorm(reps*length(Zu), sd=sd) YYtrue <- ZZu + bias(XX) ## variations: remove the bias or change 2 to 1 to remove replicates ## computer model design nz <- 10000 XT <- lhs(nz, rect) nth <- 1 ## number of threads to use in emulation, demo uses 8 ## augment with physical model design points ## with various u settings XT2 <- matrix(NA, nrow=10*ny, ncol=4) for(i in 1:10) { I <- ((i-1)*ny+1):(ny*i) XT2[I,1:2] <- X } XT2[,3:4] <- lhs(10*ny, rect[3:4,]) XT <- rbind(XT, XT2) ## evaluate the computer model Z <- M(XT[,1:2], XT[,3:4]) ## flag indicating if estimating bias/discrepancy or not bias.est <- TRUE ## two passes of ALC with MLE calculations for aGP.seq methods <- rep("alcray", 2) ## demo uses rep("alc", 2) ## set up priors da <- d <- darg(NULL, XT) g <- garg(list(mle=TRUE), Y) ## end identical data generation code ## now calculate log posterior and do out-of-sample RMSE calculation ## for true calibration parameter value (u). You could repeat ## this for an estimate value from demo("calib"), for example ## u.hat <- c(0.8236673, 0.1406989) ## first log posterior ## emulate at field-data locations Xu Xu <- cbind(X, matrix(rep(u, ny), ncol=2, byrow=TRUE)) ehat.u <- aGP.seq(XT, Z, Xu, da, methods, ncalib=2, omp.threads=nth, verb=0) ## estimate discrepancy from the residual cmle.u <- discrep.est(X, Y, ehat.u$mean, d, g, bias.est, FALSE) cmle.u$ll <- cmle.u$ll + beta.prior(u) print(cmle.u$ll) ## compare to same calculation with u.hat above ## now RMSE ## Not run: ## predictive design with true calibration parameter XXu <- cbind(XX, matrix(rep(u, nny), ncol=2, byrow=TRUE)) ## emulate at predictive design ehat.oos.u <- aGP.seq(XT, Z, XXu, da, methods, ncalib=2, omp.threads=nth, verb=0) ## predict via discrepency YYm.pred.u <- predGP(cmle.u$gp, XX) ## add in emulation YY.pred.u <- YYm.pred.u$mean + ehat.oos.u$mean ## calculate RMSE rmse.u <- sqrt(mean((YY.pred.u - YYtrue)^2)) print(rmse.u) ## compare to same calculation with u.hat above ## clean up deleteGP(cmle.u$gp) ## End(Not run)
## the example here combines aGP.seq and discrep.est functions; ## it is comprised of snippets from demo("calib"), which contains ## code from the Calibration Section of vignette("laGP") ## Here we generate calibration data using a true calibration ## parameter, u, and then evaluate log posterior probabilities ## and out-of-sample RMSEs for that u value; the fcalib ## documentation repeats this with a single call to fcalib rather ## than first aGP.seq and then discrep.est ## begin data-generation code identical to aGP.seq, discrep.est, fcalib ## example sections and demo("calib") ## M: computer model test functon used in Goh et al, 2013 (Technometrics) ## an elaboration of one from Bastos and O'Hagan, 2009 (Technometrics) M <- function(x,u) { x <- as.matrix(x) u <- as.matrix(u) out <- (1-exp(-1/(2*x[,2]))) out <- out * (1000*u[,1]*x[,1]^3+1900*x[,1]^2+2092*x[,1]+60) out <- out / (100*u[,2]*x[,1]^3+500*x[,1]^2+4*x[,1]+20) return(out) } ## bias: discrepancy function from Goh et al, 2013 bias <- function(x) { x<-as.matrix(x) out<- 2*(10*x[,1]^2+4*x[,2]^2) / (50*x[,1]*x[,2]+10) return(out) } ## beta.prior: marginal beta prior for u, ## defaults to a mode at 1/2 in hypercube beta.prior <- function(u, a=2, b=2, log=TRUE) { if(length(a) == 1) a <- rep(a, length(u)) else if(length(a) != length(u)) stop("length(a) must be 1 or length(u)") if(length(b) == 1) b <- rep(b, length(u)) else if(length(b) != length(u)) stop("length(b) must be 1 or length(u)") if(log) return(sum(dbeta(u, a, b, log=TRUE))) else return(prod(dbeta(u, a, b, log=FALSE))) } ## tgp for LHS sampling library(tgp) rect <- matrix(rep(0:1, 4), ncol=2, byrow=2) ## training and testing inputs ny <- 50; nny <- 1000 X <- lhs(ny, rect[1:2,]) ## computer model train XX <- lhs(nny, rect[1:2,],) ## test ## true (but unknown) setting of the calibration parameter ## for the computer model u <- c(0.2, 0.1) Zu <- M(X, matrix(u, nrow=1)) ZZu <- M(XX, matrix(u, nrow=1)) ## field data response, biased and replicated sd <- 0.5 ## Y <- computer output + bias + noise reps <- 2 ## example from paper uses reps <- 10 Y <- rep(Zu,reps) + rep(bias(X),reps) + rnorm(reps*length(Zu), sd=sd) YYtrue <- ZZu + bias(XX) ## variations: remove the bias or change 2 to 1 to remove replicates ## computer model design nz <- 10000 XT <- lhs(nz, rect) nth <- 1 ## number of threads to use in emulation, demo uses 8 ## augment with physical model design points ## with various u settings XT2 <- matrix(NA, nrow=10*ny, ncol=4) for(i in 1:10) { I <- ((i-1)*ny+1):(ny*i) XT2[I,1:2] <- X } XT2[,3:4] <- lhs(10*ny, rect[3:4,]) XT <- rbind(XT, XT2) ## evaluate the computer model Z <- M(XT[,1:2], XT[,3:4]) ## flag indicating if estimating bias/discrepancy or not bias.est <- TRUE ## two passes of ALC with MLE calculations for aGP.seq methods <- rep("alcray", 2) ## demo uses rep("alc", 2) ## set up priors da <- d <- darg(NULL, XT) g <- garg(list(mle=TRUE), Y) ## end identical data generation code ## now calculate log posterior and do out-of-sample RMSE calculation ## for true calibration parameter value (u). You could repeat ## this for an estimate value from demo("calib"), for example ## u.hat <- c(0.8236673, 0.1406989) ## first log posterior ## emulate at field-data locations Xu Xu <- cbind(X, matrix(rep(u, ny), ncol=2, byrow=TRUE)) ehat.u <- aGP.seq(XT, Z, Xu, da, methods, ncalib=2, omp.threads=nth, verb=0) ## estimate discrepancy from the residual cmle.u <- discrep.est(X, Y, ehat.u$mean, d, g, bias.est, FALSE) cmle.u$ll <- cmle.u$ll + beta.prior(u) print(cmle.u$ll) ## compare to same calculation with u.hat above ## now RMSE ## Not run: ## predictive design with true calibration parameter XXu <- cbind(XX, matrix(rep(u, nny), ncol=2, byrow=TRUE)) ## emulate at predictive design ehat.oos.u <- aGP.seq(XT, Z, XXu, da, methods, ncalib=2, omp.threads=nth, verb=0) ## predict via discrepency YYm.pred.u <- predGP(cmle.u$gp, XX) ## add in emulation YY.pred.u <- YYm.pred.u$mean + ehat.oos.u$mean ## calculate RMSE rmse.u <- sqrt(mean((YY.pred.u - YYtrue)^2)) print(rmse.u) ## compare to same calculation with u.hat above ## clean up deleteGP(cmle.u$gp) ## End(Not run)
Calculate the squared Euclidean distance between pairs of points and return a distance matrix
distance(X1, X2 = NULL)
distance(X1, X2 = NULL)
X1 |
a |
X2 |
an optional |
If X2 = NULL
distances between X1
and itself are
calculated, resulting in an nrow(X1)
-by-nrow(X1)
distance
matrix. Otherwise the result is nrow(X1)
-by-nrow(X2)
and
contains distances between X1
and X2
.
Calling distance(X)
is the same as distance(X,X)
The output is a matrix
, whose dimensions are described in the Details
section above
Robert B. Gramacy [email protected]
x <- seq(-2, 2, length=11) X <- as.matrix(expand.grid(x, x)) ## predictive grid with NN=400 xx <- seq(-1.9, 1.9, length=20) XX <- as.matrix(expand.grid(xx, xx)) D <- distance(X) DD <- distance(X, XX)
x <- seq(-2, 2, length=11) X <- as.matrix(expand.grid(x, x)) ## predictive grid with NN=400 xx <- seq(-1.9, 1.9, length=20) XX <- as.matrix(expand.grid(xx, xx)) D <- distance(X) DD <- distance(X, XX)
Defines an objective function for performing blackbox optimization towards solving a modularized calibration of large computer model simulation to field data
fcalib(u, XU, Z, X, Y, da, d, g, uprior = NULL, methods = rep("alc", 2), M = NULL, bias = TRUE, omp.threads = 1, save.global = FALSE, verb = 1)
fcalib(u, XU, Z, X, Y, da, d, g, uprior = NULL, methods = rep("alc", 2), M = NULL, bias = TRUE, omp.threads = 1, save.global = FALSE, verb = 1)
u |
a vector of length |
XU |
a |
Z |
a vector of responses/dependent values with |
X |
a |
Y |
a vector of values with |
da |
for emulating |
d |
for the discrepancy between emulations |
g |
for the nugget in the GP model for the discrepancy between emulation
|
uprior |
an optional function taking |
methods |
a sequence of local search methods to be deployed when emulating
|
M |
a computer model “simulation” function taking two matrices as
inputs, to be used in lieu of emulation; see |
bias |
a scalar logical indicating whether a GP discrepancy or bias term should
be estimated via |
omp.threads |
a scalar positive integer indicating the number
of threads to use for SMP parallel processing; see |
save.global |
an environment, e.g., |
verb |
a non-negative integer specifying the verbosity level; |
Gramacy, et al. (2015) defined an objective function which, when optimized,
returns a setting of calibration parameters under a setup akin to
the modularized calibration method of Liu, et al., (2009). The fcalib
function returns a log density (likelihood or posterior probability) value
obtained by performing emulation at a set of inputs X
augmented
with a value of the calibration parameter, u
. The emulator
is trained on XU
and Z
, presumed to be very large
relative to the size of the field data set X
and Y
,
necessitating the use of approximate methods like aGP
,
via aGP.seq
. The
emulated values, call them Yhat
are fed along with X
and
Y
into the discrep.est
function, whose
likelihood or posterior calculation serves as a measure of merit for
the value u
.
The fcalib
function is deterministic but, as Gramacy, et al. (2015)
described, can result is a rugged objective surface for optimizing,
meaning that conventional methods, like those in optim
are unlikely to work well. They instead recommend using a blackbox
derivative-free method, like NOMAD (Le Digabel, 2011). In our example
below we use the implementation in the crs package, which provides
an R wrapper around the underlying C library.
Note that while fcalib
automates a call first to aGP.seq
and then to discrep.est
, it does not return enough
information to complete, say, an out-of-sample prediction exercise like
the one demonstrated in the discrep.est
documentation.
Therefore, after fcalib
is used in an optimization to
find the best setting of the calibration parameter, u
,
those functions must then be used in post-processing to complete a
prediction exercise. See demo("calib")
or vignette("laGP")
for more details
Returns a scalar measuring the negative log likelihood or posterior density
of the calibration parameter u
given the other inputs, for
the purpose of optimization over u
Note that in principle a separable correlation function could be used
(e.g, via newGPsep
and mleGPsep
),
however this is not implemented at this time
Robert B. Gramacy [email protected]
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 8.) https://bobby.gramacy.com/surrogates/
R.B. Gramacy (2016). laGP: Large-Scale Spatial Modeling via
Local Approximate Gaussian Processes in R., Journal of Statistical
Software, 72(1), 1-46; doi:10.18637/jss.v072.i01
or see vignette("laGP")
R.B. Gramacy, D. Bingham, JP. Holloway, M.J. Grosskopf, C.C. Kuranz, E. Rutter, M. Trantham, and P.R. Drake (2015). Calibrating a large computer experiment simulating radiative shock hydrodynamics. Annals of Applied Statistics, 9(3) 1141-1168; preprint on arXiv:1410.3293 https://arxiv.org/abs/1410.3293
F. Liu, M. Bayarri, and J. Berger (2009). Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Analysis, 4(1) 119-150.
S. Le Digabel (2011). Algorithm 909: NOMAD: Nonlinear Optimization with the MADS algorithm. ACM Transactions on Mathematical Software, 37, 4, 44:1-44:15.
J.S. Racine, Z. and Nie (2012). crs: Categorical regression splines. R package version 0.15-18.
vignette("laGP")
,
jmleGP
, newGP
, aGP.seq
, discrep.est
,
snomadr
## the example here illustrates how fcalib combines aGP.seq and ## discrep.est functions, duplicating the example in the discrep.est ## documentation file. It is comprised of snippets from demo("calib"), ## which contains code from the Calibration Section of vignette("laGP") ## Here we generate calibration data using a true calibration ## parameter, u, and then evaluate log posterior probabilities; ## the discrep.est documentation repeats this with by first calling ## aGP.seq and then discrep.est. The answers should be identical, however ## note that a call first to example("fcalib") and then ## example("discrep.est") will generate two random data sets, causing ## the results not to match ## begin data-generation code identical to aGP.seq, discrep.est, fcalib ## example sections and demo("calib") ## M: computer model test function used in Goh et al, 2013 (Technometrics) ## an elaboration of one from Bastos and O'Hagan, 2009 (Technometrics) M <- function(x,u) { x <- as.matrix(x) u <- as.matrix(u) out <- (1-exp(-1/(2*x[,2]))) out <- out * (1000*u[,1]*x[,1]^3+1900*x[,1]^2+2092*x[,1]+60) out <- out / (100*u[,2]*x[,1]^3+500*x[,1]^2+4*x[,1]+20) return(out) } ## bias: discrepancy function from Goh et al, 2013 bias <- function(x) { x<-as.matrix(x) out<- 2*(10*x[,1]^2+4*x[,2]^2) / (50*x[,1]*x[,2]+10) return(out) } ## beta.prior: marginal beta prior for u, ## defaults to a mode at 1/2 in hypercube beta.prior <- function(u, a=2, b=2, log=TRUE) { if(length(a) == 1) a <- rep(a, length(u)) else if(length(a) != length(u)) stop("length(a) must be 1 or length(u)") if(length(b) == 1) b <- rep(b, length(u)) else if(length(b) != length(u)) stop("length(b) must be 1 or length(u)") if(log) return(sum(dbeta(u, a, b, log=TRUE))) else return(prod(dbeta(u, a, b, log=FALSE))) } ## tgp for LHS sampling library(tgp) rect <- matrix(rep(0:1, 4), ncol=2, byrow=2) ## training inputs ny <- 50; X <- lhs(ny, rect[1:2,]) ## computer model train ## true (but unknown) setting of the calibration parameter ## for the computer model u <- c(0.2, 0.1) Zu <- M(X, matrix(u, nrow=1)) ## field data response, biased and replicated sd <- 0.5 ## Y <- computer output + bias + noise reps <- 2 ## example from paper uses reps <- 10 Y <- rep(Zu,reps) + rep(bias(X),reps) + rnorm(reps*length(Zu), sd=sd) ## variations: remove the bias or change 2 to 1 to remove replicates ## computer model design nz <- 10000 XU <- lhs(nz, rect) nth <- 1 ## number of threads to use in emulation, demo uses 8 ## augment with physical model design points ## with various u settings XU2 <- matrix(NA, nrow=10*ny, ncol=4) for(i in 1:10) { I <- ((i-1)*ny+1):(ny*i) XU2[I,1:2] <- X } XU2[,3:4] <- lhs(10*ny, rect[3:4,]) XU <- rbind(XU, XU2) ## evaluate the computer model Z <- M(XU[,1:2], XU[,3:4]) ## flag indicating if estimating bias/discrepancy or not bias.est <- TRUE ## two passes of ALC with MLE calculations for aGP.seq methods <- rep("alcray", 2) ## demo uses rep("alc", 2) ## set up priors da <- d <- darg(NULL, XU) g <- garg(list(mle=TRUE), Y) ## end identical data generation code ## now calculate log posterior for true calibration parameter ## value (u). You could repeat this for an estimate value ## from demo("calib"), for example u.hat <- c(0.8236673, 0.1406989) fcalib(u, XU, Z, X, Y, da, d, g, beta.prior, methods, M, bias.est, nth)
## the example here illustrates how fcalib combines aGP.seq and ## discrep.est functions, duplicating the example in the discrep.est ## documentation file. It is comprised of snippets from demo("calib"), ## which contains code from the Calibration Section of vignette("laGP") ## Here we generate calibration data using a true calibration ## parameter, u, and then evaluate log posterior probabilities; ## the discrep.est documentation repeats this with by first calling ## aGP.seq and then discrep.est. The answers should be identical, however ## note that a call first to example("fcalib") and then ## example("discrep.est") will generate two random data sets, causing ## the results not to match ## begin data-generation code identical to aGP.seq, discrep.est, fcalib ## example sections and demo("calib") ## M: computer model test function used in Goh et al, 2013 (Technometrics) ## an elaboration of one from Bastos and O'Hagan, 2009 (Technometrics) M <- function(x,u) { x <- as.matrix(x) u <- as.matrix(u) out <- (1-exp(-1/(2*x[,2]))) out <- out * (1000*u[,1]*x[,1]^3+1900*x[,1]^2+2092*x[,1]+60) out <- out / (100*u[,2]*x[,1]^3+500*x[,1]^2+4*x[,1]+20) return(out) } ## bias: discrepancy function from Goh et al, 2013 bias <- function(x) { x<-as.matrix(x) out<- 2*(10*x[,1]^2+4*x[,2]^2) / (50*x[,1]*x[,2]+10) return(out) } ## beta.prior: marginal beta prior for u, ## defaults to a mode at 1/2 in hypercube beta.prior <- function(u, a=2, b=2, log=TRUE) { if(length(a) == 1) a <- rep(a, length(u)) else if(length(a) != length(u)) stop("length(a) must be 1 or length(u)") if(length(b) == 1) b <- rep(b, length(u)) else if(length(b) != length(u)) stop("length(b) must be 1 or length(u)") if(log) return(sum(dbeta(u, a, b, log=TRUE))) else return(prod(dbeta(u, a, b, log=FALSE))) } ## tgp for LHS sampling library(tgp) rect <- matrix(rep(0:1, 4), ncol=2, byrow=2) ## training inputs ny <- 50; X <- lhs(ny, rect[1:2,]) ## computer model train ## true (but unknown) setting of the calibration parameter ## for the computer model u <- c(0.2, 0.1) Zu <- M(X, matrix(u, nrow=1)) ## field data response, biased and replicated sd <- 0.5 ## Y <- computer output + bias + noise reps <- 2 ## example from paper uses reps <- 10 Y <- rep(Zu,reps) + rep(bias(X),reps) + rnorm(reps*length(Zu), sd=sd) ## variations: remove the bias or change 2 to 1 to remove replicates ## computer model design nz <- 10000 XU <- lhs(nz, rect) nth <- 1 ## number of threads to use in emulation, demo uses 8 ## augment with physical model design points ## with various u settings XU2 <- matrix(NA, nrow=10*ny, ncol=4) for(i in 1:10) { I <- ((i-1)*ny+1):(ny*i) XU2[I,1:2] <- X } XU2[,3:4] <- lhs(10*ny, rect[3:4,]) XU <- rbind(XU, XU2) ## evaluate the computer model Z <- M(XU[,1:2], XU[,3:4]) ## flag indicating if estimating bias/discrepancy or not bias.est <- TRUE ## two passes of ALC with MLE calculations for aGP.seq methods <- rep("alcray", 2) ## demo uses rep("alc", 2) ## set up priors da <- d <- darg(NULL, XU) g <- garg(list(mle=TRUE), Y) ## end identical data generation code ## now calculate log posterior for true calibration parameter ## value (u). You could repeat this for an estimate value ## from demo("calib"), for example u.hat <- c(0.8236673, 0.1406989) fcalib(u, XU, Z, X, Y, da, d, g, beta.prior, methods, M, bias.est, nth)
Build a sub-design of X
of size end
, and infer parameters,
for approximate Gaussian process prediction at reference location(s)
Xref
. Return the moments of those predictive equations, and indices
into the local design
laGP(Xref, start, end, X, Z, d = NULL, g = 1/10000, method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1, rect = NULL, lite = TRUE, verb = 0) laGP.R(Xref, start, end, X, Z, d = NULL, g = 1/10000, method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, pall = FALSE, close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), parallel = c("none", "omp", "gpu"), numstart = if(method[1] == "alcray") ncol(X) else 1, rect = NULL, lite = TRUE, verb = 0) laGPsep(Xref, start, end, X, Z, d = NULL, g = 1/10000, method = c("alc", "alcopt", "alcray", "nn"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1, rect = NULL, lite = TRUE, verb=0) laGPsep.R(Xref, start, end, X, Z, d = NULL, g = 1/10000, method = c("alc", "alcopt", "alcray", "nn"), Xi.ret = TRUE, pall = FALSE, close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), parallel = c("none", "omp", "gpu"), numstart = if(method[1] == "alcray") ncol(X) else 1, rect = NULL, lite = TRUE, verb = 0)
laGP(Xref, start, end, X, Z, d = NULL, g = 1/10000, method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1, rect = NULL, lite = TRUE, verb = 0) laGP.R(Xref, start, end, X, Z, d = NULL, g = 1/10000, method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE, pall = FALSE, close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), parallel = c("none", "omp", "gpu"), numstart = if(method[1] == "alcray") ncol(X) else 1, rect = NULL, lite = TRUE, verb = 0) laGPsep(Xref, start, end, X, Z, d = NULL, g = 1/10000, method = c("alc", "alcopt", "alcray", "nn"), Xi.ret = TRUE, close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1, rect = NULL, lite = TRUE, verb=0) laGPsep.R(Xref, start, end, X, Z, d = NULL, g = 1/10000, method = c("alc", "alcopt", "alcray", "nn"), Xi.ret = TRUE, pall = FALSE, close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)), parallel = c("none", "omp", "gpu"), numstart = if(method[1] == "alcray") ncol(X) else 1, rect = NULL, lite = TRUE, verb = 0)
Xref |
a vector of length |
start |
the number of Nearest Neighbor (NN) locations for initialization; must
specify |
end |
the total size of the local designs; must have |
X |
a |
Z |
a vector of responses/dependent values with |
d |
a prior or initial setting for the lengthscale
parameter for a Gaussian correlation function; a (default)
|
g |
a prior or initial setting for the nugget parameter; a
|
method |
Specifies the method by which |
Xi.ret |
A scalar logical indicating whether or not a vector of indices
into |
pall |
a scalar logical (for |
close |
a non-negative integer |
alc.gpu |
a scalar |
parallel |
a switch indicating if any parallel calculation of
the criteria is desired. Currently parallelization at this level is only
provided for option |
numstart |
a scalar integer indicating the number of rays for each
greedy search when |
rect |
an optional |
lite |
Similar to the |
verb |
a non-negative integer specifying the verbosity level; |
A sub-design of X
of size end
is built-up according to
the criteria prescribed by the method
and then used to predict at
Xref
. The first start
locations are NNs in order to
initialize the first GP, via newGP
or newGPsep
,
and thereby initialize the
search. Each subsequent addition is found via the chosen criterion
(method
), and the GP fit is updated via updateGP
or updateGPsep
The runtime is cubic in end
, although
the multiplicative “constant” out front depends on the
method
chosen, the size of the design X
, and
close
. The "alcray"
method has a smaller constant
since it does not search over all candidates exhaustively.
After building the sub-design, local MLE/MAP lengthscale (and/or
nugget) parameters are estimated, depending on the d
and
g
arguments supplied. This is facilitated by calls to
mleGP
or jmleGP
.
Finally predGP
is called on the resulting local GP
model, and the parameters of the resulting Student-t distribution(s)
are returned. Unless Xi.ret = FALSE
, the indices of the
local design are also returned.
laGP.R
and laGPsep.R
are a prototype R-only version for
debugging and transparency purposes. They are slower than
laGP
and laGPsep
, which are primarily in C, and may not
provide identical output in all cases due to differing library implementations
used as subroutines; see note below for an example. laGP.R
and other
.R
functions in the package may be useful for developing new programs
that involve similar subroutines. The current version of laGP.R
allows OpenMP and/or GPU parallelization of the criteria (method
) if
the package is compiled with the appropriate flags. See README/INSTALL in
the package source for more information. For algorithmic details, see
Gramacy, Niemi, & Weiss (2014)
The output is a list
with the following components.
mean |
a vector of predictive means of length |
s2 |
a vector of Student-t scale parameters of length
|
df |
a Student-t degrees of freedom scalar (applies to all
|
llik |
a scalar indicating the maximized log likelihood or log posterior probability of the data/parameter(s) under the chosen sub-design; provided up to an additive constant |
time |
a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation |
method |
a copy of the |
d |
a full-list version of the |
g |
a full-list version of the |
mle |
if |
Xi |
when |
close |
a copy of the input argument |
laGPsep
provides the same functionality as laGP
but deploys
a separable covariance function. However criteria (method
s) EFI and
MSPE are not supported. This is considered “beta” functionality
at this time.
Note that using method="NN"
gives the same result as specifying
start=end
, however at some extra computational expense.
Handling multiple reference locations
(nrow(Xref) > 1
) is “beta” functionality. In this case
the initial start
locations are chosen by applying NN to the
average distances to all Xref
locations. Using
method="alcopt"
causes exhaustive search to be approximated by
a continuous analog via closed form derivatives.
See alcoptGP
for more details. Although the approximation
provided has a spirit similar to method="alcray"
, in that
both methods are intended to offer a thrifty alternative,
method="alcray"
is not applicable when nrow(Xref) > 1
.
Differences between the C qsort
function and R's
order
function may cause chosen designs returned from
laGP
and laGP.R
(code and laGPsep
and laGPsep.R
)
to differ when multiple X
values are equidistant to Xref
Robert B. Gramacy [email protected] and Furong Sun [email protected]
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 9.) https://bobby.gramacy.com/surrogates/
F. Sun, R.B. Gramacy, B. Haaland, E. Lawrence, and A. Walker (2019). Emulating satellite drag from large simulation experiments, SIAM/ASA Journal on Uncertainty Quantification, 7(2), pp. 720-759; preprint on arXiv:1712.00182; https://arxiv.org/abs/1712.00182
R.B. Gramacy (2016). laGP: Large-Scale Spatial Modeling via
Local Approximate Gaussian Processes in R., Journal of Statistical
Software, 72(1), 1-46; doi:10.18637/jss.v072.i01
or see vignette("laGP")
R.B. Gramacy and B. Haaland (2016). Speeding up neighborhood search in local Gaussian process prediction. Technometrics, 58(3), pp. 294-303; preprint on arXiv:1409.0074 https://arxiv.org/abs/1409.0074
R.B. Gramacy and D.W. Apley (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2), pp. 561-678; preprint on arXiv:1303.0383; https://arxiv.org/abs/1303.0383
R.B. Gramacy, J. Niemi, R.M. Weiss (2014). Massively parallel approximate Gaussian process regression. SIAM/ASA Journal on Uncertainty Quantification, 2(1), pp. 568-584; preprint on arXiv:1310.5182; https://arxiv.org/abs/1310.5182
vignette("laGP")
,
aGP
, newGP
, updateGP
,
predGP
, mleGP
, jmleGP
,
alcGP
, mspeGP
, alcrayGP
,
randLine
## path-based local prediction via laGP
## examining a particular laGP call from the larger analysis provided ## in the aGP documentation ## A simple 2-d test function used in Gramacy & Apley (2014); ## thanks to Lee, Gramacy, Taddy, and others who have used it before f2d <- function(x, y=NULL) { if(is.null(y)) { if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## build up a design with N=~40K locations x <- seq(-2, 2, by=0.02) X <- as.matrix(expand.grid(x, x)) Z <- f2d(X) ## optional first pass of nearest neighbor Xref <- matrix(c(-1.725, 1.725), nrow=TRUE) out <- laGP(Xref, 6, 50, X, Z, method="nn") ## second pass via ALC, ALCOPT, MSPE, and ALC-ray respectively, ## conditioned on MLE d-values found above out2 <- laGP(Xref, 6, 50, X, Z, d=out$mle$d) # out2.alcopt <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcopt") out2.mspe <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="mspe") out2.alcray <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcray") ## look at the different designs plot(rbind(X[out2$Xi,], X[out2.mspe$Xi,]), type="n", xlab="x1", ylab="x2", main="comparing local designs") points(Xref[1], Xref[2], col=2, cex=0.5) text(X[out2$Xi,], labels=1:50, cex=0.6) # text(X[out2.alcopt$Xi,], labels=1:50, cex=0.6, col="forestgreen") text(X[out2.mspe$Xi,], labels=1:50, cex=0.6, col="blue") text(X[out2.alcray$Xi,], labels=1:50, cex=0.6, col="red") legend("right", c("ALC", "ALCOPT", "MSPE", "ALCRAY"), text.col=c("black", "forestgreen", "blue", "red"), bty="n") ## compare computational time c(nn=out$time, alc=out2$time, # alcopt=out2.alcopt$time, mspe=out2.mspe$time, alcray=out2.alcray$time) ## Not run: ## Joint path sampling: a comparison between ALC-ex, ALC-opt and NN ## defining a predictive path wx <- seq(-0.85, 0.45, length=100) W <- cbind(wx-0.75, wx^3+0.51) ## three comparators from Sun, et al. (2017) ## larger-than-default "close" argument to capture locations nearby path p.alc <- laGPsep(W, 6, 100, X, Z, close=10000, lite=FALSE) p.alcopt <- laGPsep(W, 6, 100, X, Z, method="alcopt", lite=FALSE) ## note that close=10*(1000+end) would be the default for method = "alcopt" p.nn <- laGPsep(W, 6, 100, X, Z, method="nn", close=10000, lite=FALSE) ## time comparison c(alc=p.alc$time, alcopt=p.alcopt$time, nn=p.nn$time) ## visualization plot(W, type="l", xlab="x1", ylab="x2", xlim=c(-2.25,0), ylim=c(-0.75,1.25), lwd=2) points(X[p.alc$Xi,], col=2, cex=0.6) lines(W[,1]+0.25, W[,2]-0.25, lwd=2) points(X[p.nn$Xi,1]+0.25, X[p.nn$Xi,2]-0.25, pch=22, col=3, cex=0.6) lines(W[,1]-0.25, W[,2]+0.25, lwd=2) points(X[p.alcopt$Xi,1]-0.25, X[p.alcopt$Xi,2]+0.25, pch=23, col=4, cex=0.6) legend("bottomright", c("ALC-opt", "ALC-ex", "NN"), pch=c(22, 21, 23), col=c(4,2,3)) ## End(Not run)
## examining a particular laGP call from the larger analysis provided ## in the aGP documentation ## A simple 2-d test function used in Gramacy & Apley (2014); ## thanks to Lee, Gramacy, Taddy, and others who have used it before f2d <- function(x, y=NULL) { if(is.null(y)) { if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## build up a design with N=~40K locations x <- seq(-2, 2, by=0.02) X <- as.matrix(expand.grid(x, x)) Z <- f2d(X) ## optional first pass of nearest neighbor Xref <- matrix(c(-1.725, 1.725), nrow=TRUE) out <- laGP(Xref, 6, 50, X, Z, method="nn") ## second pass via ALC, ALCOPT, MSPE, and ALC-ray respectively, ## conditioned on MLE d-values found above out2 <- laGP(Xref, 6, 50, X, Z, d=out$mle$d) # out2.alcopt <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcopt") out2.mspe <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="mspe") out2.alcray <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcray") ## look at the different designs plot(rbind(X[out2$Xi,], X[out2.mspe$Xi,]), type="n", xlab="x1", ylab="x2", main="comparing local designs") points(Xref[1], Xref[2], col=2, cex=0.5) text(X[out2$Xi,], labels=1:50, cex=0.6) # text(X[out2.alcopt$Xi,], labels=1:50, cex=0.6, col="forestgreen") text(X[out2.mspe$Xi,], labels=1:50, cex=0.6, col="blue") text(X[out2.alcray$Xi,], labels=1:50, cex=0.6, col="red") legend("right", c("ALC", "ALCOPT", "MSPE", "ALCRAY"), text.col=c("black", "forestgreen", "blue", "red"), bty="n") ## compare computational time c(nn=out$time, alc=out2$time, # alcopt=out2.alcopt$time, mspe=out2.mspe$time, alcray=out2.alcray$time) ## Not run: ## Joint path sampling: a comparison between ALC-ex, ALC-opt and NN ## defining a predictive path wx <- seq(-0.85, 0.45, length=100) W <- cbind(wx-0.75, wx^3+0.51) ## three comparators from Sun, et al. (2017) ## larger-than-default "close" argument to capture locations nearby path p.alc <- laGPsep(W, 6, 100, X, Z, close=10000, lite=FALSE) p.alcopt <- laGPsep(W, 6, 100, X, Z, method="alcopt", lite=FALSE) ## note that close=10*(1000+end) would be the default for method = "alcopt" p.nn <- laGPsep(W, 6, 100, X, Z, method="nn", close=10000, lite=FALSE) ## time comparison c(alc=p.alc$time, alcopt=p.alcopt$time, nn=p.nn$time) ## visualization plot(W, type="l", xlab="x1", ylab="x2", xlim=c(-2.25,0), ylim=c(-0.75,1.25), lwd=2) points(X[p.alc$Xi,], col=2, cex=0.6) lines(W[,1]+0.25, W[,2]-0.25, lwd=2) points(X[p.nn$Xi,1]+0.25, X[p.nn$Xi,2]-0.25, pch=22, col=3, cex=0.6) lines(W[,1]-0.25, W[,2]+0.25, lwd=2) points(X[p.alcopt$Xi,1]-0.25, X[p.alcopt$Xi,2]+0.25, pch=23, col=4, cex=0.6) legend("bottomright", c("ALC-opt", "ALC-ex", "NN"), pch=c(22, 21, 23), col=c(4,2,3)) ## End(Not run)
Calculate a Gaussian process (GP) log likelihood or posterior probability with reference to a C-side GP object
llikGP(gpi, dab = c(0, 0), gab = c(0, 0)) llikGPsep(gpsepi, dab = c(0, 0), gab = c(0, 0))
llikGP(gpi, dab = c(0, 0), gab = c(0, 0)) llikGPsep(gpsepi, dab = c(0, 0), gab = c(0, 0))
gpi |
a C-side GP object identifier (positive integer);
e.g., as returned by |
gpsepi |
similar to |
dab |
|
gab |
|
An “ab
” parameter is a non-negative 2-vector describing
shape and rate parameters to a Gamma prior; a zero-setting for
either value results in no-prior being used in which case a log likelihood
is returned. If both ab
parameters are specified, then the value
returned can be interpreted as a log posterior density. See darg
for more information about ab
A real-valued scalar is returned.
Robert B. Gramacy [email protected]
## partly following the example in mleGP if(require("MASS")) { ## motorcycle data and predictive locations X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] ## get sensible ranges d <- darg(NULL, X) g <- garg(list(mle=TRUE), Z) ## initialize the model gpi <- newGP(X, Z, d=d$start, g=g$start) ## calculate log likelihood llikGP(gpi) ## calculate posterior probability llikGP(gpi, d$ab, g$ab) ## clean up deleteGP(gpi) }
## partly following the example in mleGP if(require("MASS")) { ## motorcycle data and predictive locations X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] ## get sensible ranges d <- darg(NULL, X) g <- garg(list(mle=TRUE), Z) ## initialize the model gpi <- newGP(X, Z, d=d$start, g=g$start) ## calculate log likelihood llikGP(gpi) ## calculate posterior probability llikGP(gpi, d$ab, g$ab) ## clean up deleteGP(gpi) }
Maximum likelihood/a posteriori inference for (isotropic and separable) Gaussian lengthscale and nugget parameters, marginally or jointly, for Gaussian process regression
mleGP(gpi, param = c("d", "g"), tmin=sqrt(.Machine$double.eps), tmax = -1, ab = c(0, 0), verb = 0) mleGPsep(gpsepi, param=c("d", "g", "both"), tmin=rep(sqrt(.Machine$double.eps), 2), tmax=c(-1,1), ab=rep(0,4), maxit=100, verb=0) mleGPsep.R(gpsepi, param=c("d", "g"), tmin=sqrt(.Machine$double.eps), tmax=-1, ab=c(0,0), maxit=100, verb=0) jmleGP(gpi, drange=c(sqrt(.Machine$double.eps),10), grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0) jmleGP.R(gpi, N=100, drange=c(sqrt(.Machine$double.eps),10), grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0) jmleGPsep(gpsepi, drange=c(sqrt(.Machine$double.eps),10), grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), maxit=100, verb=0) jmleGPsep.R(gpsepi, N=100, drange=c(sqrt(.Machine$double.eps),10), grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), maxit=100, mleGPsep=mleGPsep.R, verb=0)
mleGP(gpi, param = c("d", "g"), tmin=sqrt(.Machine$double.eps), tmax = -1, ab = c(0, 0), verb = 0) mleGPsep(gpsepi, param=c("d", "g", "both"), tmin=rep(sqrt(.Machine$double.eps), 2), tmax=c(-1,1), ab=rep(0,4), maxit=100, verb=0) mleGPsep.R(gpsepi, param=c("d", "g"), tmin=sqrt(.Machine$double.eps), tmax=-1, ab=c(0,0), maxit=100, verb=0) jmleGP(gpi, drange=c(sqrt(.Machine$double.eps),10), grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0) jmleGP.R(gpi, N=100, drange=c(sqrt(.Machine$double.eps),10), grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0) jmleGPsep(gpsepi, drange=c(sqrt(.Machine$double.eps),10), grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), maxit=100, verb=0) jmleGPsep.R(gpsepi, N=100, drange=c(sqrt(.Machine$double.eps),10), grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), maxit=100, mleGPsep=mleGPsep.R, verb=0)
gpi |
a C-side GP object identifier (positive integer);
e.g., as returned by |
gpsepi |
similar to |
N |
for |
param |
for |
tmin |
for |
tmax |
for |
drange |
for |
grange |
for |
ab |
for |
maxit |
for |
dab |
for |
gab |
for |
mleGPsep |
function for internal MLE calculation of the separable
lengthscale parameter; one of either |
verb |
a verbosity argument indicating how much information about the optimization
steps should be printed to the screen; |
mleGP
and mleGPsep
perform marginal (or profile) inference
for the specified param
, either the lengthscale or the nugget.
mleGPsep
can perform simultaneous lengthscale and nugget inference via
a common gradient with param = "both"
. More details are provided below.
For the lengthscale, mleGP
uses a Newton-like scheme with analytic first
and second derivatives (more below) to find the scalar parameter for the isotropic
Gaussian correlation function, with hard-coded 100-max iterations threshold and a
sqrt(.Machine$double.eps)
tolerance for determining convergence;
mleGPsep.R
uses L-BFGS-B via optim
for the
vectorized parameter of the separable Gaussian correlation, with a user-supplied
maximum number of iterations (maxit
) passed to optim
.
When maxit
is reached the output conv = 1
is returned,
subsequent identical calls to mleGPsep.R
can be used to continue with
further iterations. mleGPsep
is similar, but uses the C
entry point lbfgsb
.
For the nugget, both mleGP
and mleGPsep
utilize a (nearly
identical) Newton-like scheme leveraging first and second derivatives.
jmleGP
and jmleGPsep
provide joint inference by iterating
over the marginals of lengthscale and nugget. The jmleGP.R
function
is an R-only wrapper around
mleGP
(which is primarily in C), whereas jmleGP
is
primarily in C but with reduced output and
with hard-coded N=100
. The same is true for jmleGPsep
.
mleGPsep
provides a param = "both"
alternative to jmleGPsep
leveraging a common gradient. It can be helpful to supply a larger maxit
argument compared to jmleGPsep
as the latter may do up to 100 outer
iterations, cycling over lengthscale and nugget. mleGPsep
usually requires
many fewer total iterations, unless one of the lengthscale or nugget is
already converged. In anticipation of param = "both"
the
mleGPsep
function has longer default values for its bounds and prior
arguments. These longer arguments are
ignored when param != "both"
. At this time mleGP
does not have a
param = "both"
option.
All methods are initialized at the value of the parameter(s) currently
stored by the C-side object referenced by gpi
or gpsepi
. It is
highly recommended that sensible range values (tmin
, tmax
or drange
, grange
) be provided. The defaults provided are
too loose for most applications. As illustrated in the examples below,
the darg
and garg
functions can be used
to set appropriate ranges from the distributions of inputs and output
data respectively.
The Newton-like method implemented for the isotropic lengthscale and for the
nugget offers very fast convergence to local maxima, but sometimes it fails
to converge (for any of the usual reasons). The implementation
detects this, and in such cases it invokes a Brent_fmin
call instead -
this is the method behind the optimize
function.
Note that the gpi
or gpsepi
object(s) must have been allocated with
dK=TRUE
; alternatively, one can call buildKGP
or buildKGPsep
- however, this is not in the NAMESPACE at this time
A self-explanatory data.frame
is returned containing the
values inferred and the number of iterations used. The
jmleGP.R
and jmleGPsep.R
functions will also show progress details (the values
obtained after each iteration over the marginals).
However, the most important “output” is the modified GP object which retains the setting of the parameters reported on output as a side effect.
mleGPsep
and jmleGPsep
provide an output field/column
called conv
indicating convergence (when 0), or alternately
a value agreeing with a non-convergence code provided on
output by optim
Robert B. Gramacy [email protected]
For standard GP inference, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning, or
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 5.) https://bobby.gramacy.com/surrogates/
vignette("laGP")
,
newGP
, laGP
, llikGP
, optimize
## a simple example with estimated nugget if(require("MASS")) { ## motorcycle data and predictive locations X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] ## get sensible ranges d <- darg(NULL, X) g <- garg(list(mle=TRUE), Z) ## initialize the model gpi <- newGP(X, Z, d=d$start, g=g$start, dK=TRUE) ## separate marginal inference (not necessary - for illustration only) print(mleGP(gpi, "d", d$min, d$max)) print(mleGP(gpi, "g", g$min, g$max)) ## joint inference (could skip straight to here) print(jmleGP(gpi, drange=c(d$min, d$max), grange=c(g$min, g$max))) ## plot the resulting predictive surface N <- 100 XX <- matrix(seq(min(X), max(X), length=N), ncol=1) p <- predGP(gpi, XX, lite=TRUE) plot(X, Z, main="stationary GP fit to motorcycle data") lines(XX, p$mean, lwd=2) lines(XX, p$mean+1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2) lines(XX, p$mean-1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2) ## clean up deleteGP(gpi) } ## ## with a separable correlation function ## ## 2D Example: GoldPrice Function, mimicking GP_fit package f <- function(x) { x1 <- 4*x[,1] - 2 x2 <- 4*x[,2] - 2; t1 <- 1 + (x1 + x2 + 1)^2*(19 - 14*x1 + 3*x1^2 - 14*x2 + 6*x1*x2 + 3*x2^2); t2 <- 30 + (2*x1 -3*x2)^2*(18 - 32*x1 + 12*x1^2 + 48*x2 - 36*x1*x2 + 27*x2^2); y <- t1*t2; return(y) } ## build design library(tgp) n <- 50 ## change to 100 or 1000 for more interesting demo B <- rbind(c(0,1), c(0,1)) X <- dopt.gp(n, Xcand=lhs(10*n, B))$XX ## this differs from GP_fit in that we use the log response Y <- log(f(X)) ## get sensible ranges d <- darg(NULL, X) g <- garg(list(mle=TRUE), Y) ## build GP and jointly optimize via profile mehtods gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE) jmleGPsep(gpisep, drange=c(d$min, d$max), grange=c(g$min, g$max)) ## clean up deleteGPsep(gpisep) ## alternatively, we can optimize via a combined gradient gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE) mleGPsep(gpisep, param="both", tmin=c(d$min, g$min), tmax=c(d$max, g$max)) deleteGPsep(gpisep)
## a simple example with estimated nugget if(require("MASS")) { ## motorcycle data and predictive locations X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] ## get sensible ranges d <- darg(NULL, X) g <- garg(list(mle=TRUE), Z) ## initialize the model gpi <- newGP(X, Z, d=d$start, g=g$start, dK=TRUE) ## separate marginal inference (not necessary - for illustration only) print(mleGP(gpi, "d", d$min, d$max)) print(mleGP(gpi, "g", g$min, g$max)) ## joint inference (could skip straight to here) print(jmleGP(gpi, drange=c(d$min, d$max), grange=c(g$min, g$max))) ## plot the resulting predictive surface N <- 100 XX <- matrix(seq(min(X), max(X), length=N), ncol=1) p <- predGP(gpi, XX, lite=TRUE) plot(X, Z, main="stationary GP fit to motorcycle data") lines(XX, p$mean, lwd=2) lines(XX, p$mean+1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2) lines(XX, p$mean-1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2) ## clean up deleteGP(gpi) } ## ## with a separable correlation function ## ## 2D Example: GoldPrice Function, mimicking GP_fit package f <- function(x) { x1 <- 4*x[,1] - 2 x2 <- 4*x[,2] - 2; t1 <- 1 + (x1 + x2 + 1)^2*(19 - 14*x1 + 3*x1^2 - 14*x2 + 6*x1*x2 + 3*x2^2); t2 <- 30 + (2*x1 -3*x2)^2*(18 - 32*x1 + 12*x1^2 + 48*x2 - 36*x1*x2 + 27*x2^2); y <- t1*t2; return(y) } ## build design library(tgp) n <- 50 ## change to 100 or 1000 for more interesting demo B <- rbind(c(0,1), c(0,1)) X <- dopt.gp(n, Xcand=lhs(10*n, B))$XX ## this differs from GP_fit in that we use the log response Y <- log(f(X)) ## get sensible ranges d <- darg(NULL, X) g <- garg(list(mle=TRUE), Y) ## build GP and jointly optimize via profile mehtods gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE) jmleGPsep(gpisep, drange=c(d$min, d$max), grange=c(g$min, g$max)) ## clean up deleteGPsep(gpisep) ## alternatively, we can optimize via a combined gradient gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE) mleGPsep(gpisep, param="both", tmin=c(d$min, g$min), tmax=c(d$max, g$max)) deleteGPsep(gpisep)
Build a Gaussian process C-side object based on the X
-Z
data and parameters provided, and augment those objects with new
data
newGP(X, Z, d, g, dK = FALSE) newGPsep(X, Z, d, g, dK = FALSE) updateGP(gpi, X, Z, verb = 0) updateGPsep(gpsepi, X, Z, verb = 0)
newGP(X, Z, d, g, dK = FALSE) newGPsep(X, Z, d, g, dK = FALSE) updateGP(gpi, X, Z, verb = 0) updateGPsep(gpsepi, X, Z, verb = 0)
X |
a |
Z |
a vector of responses/dependent values with |
d |
a positive scalar lengthscale parameter for an isotropic Gaussian
correlation function ( |
g |
a positive scalar nugget parameter |
dK |
a scalar logical indicating whether or not derivative information
(for the lengthscale) should be maintained by the GP object;
this is required for calculating MLEs/MAPs of the lengthscale
parameter(s) via |
gpi |
a C-side GP object identifier (positive integer); e.g., as returned by |
gpsepi |
similar to |
verb |
a non-negative integer indicating the verbosity level. A positive value
causes progress statements to be printed to the screen for each
update of |
newGP
allocates a new GP object on the C-side and returns its
unique integer identifier (gpi
), taking time which is cubic on
nrow(X)
; allocated GP objects must (eventually) be destroyed
with deleteGP
or deleteGPs
or memory will leak.
The same applies for newGPsep
, except deploying a separable
correlation with limited feature set; see deleteGPsep
and
deleteGPseps
updateGP
takes gpi
identifier as
input and augments that GP with new data. A sequence of updates is
performed, for each i in 1:nrow(X)
, each taking time which is
quadratic in the number of data points.
updateGP
also updates any statistics needed in order to quickly
search for new local design candidates via laGP
.
The same applies to updateGPsep
on gpsepi
objects
newGP
and newGPsep
create
a unique GP indicator (gpi
or gpsepi
) referencing a C-side object;
updateGP
and updateGPsep
do not return anything, but yields a
modified C-side object as a side effect
Robert B. Gramacy [email protected]
For standard GP inference, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning, or
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 6.) https://bobby.gramacy.com/surrogates/
For efficient updates of GPs, see:
R.B. Gramacy and D.W. Apley (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2), pp. 561-678; preprint on arXiv:1303.0383; https://arxiv.org/abs/1303.0383
vignette("laGP")
,
deleteGP
, mleGP
, predGP
, laGP
## for more examples, see predGP and mleGP docs ## simple sine data X <- matrix(seq(0,2*pi,length=7), ncol=1) Z <- sin(X) ## new GP fit gpi <- newGP(X, Z, 2, 0.000001) ## make predictions XX <- matrix(seq(-1,2*pi+1, length=499), ncol=ncol(X)) p <- predGP(gpi, XX) ## sample from the predictive distribution if(require(mvtnorm)) { N <- 100 ZZ <- rmvt(N, p$Sigma, p$df) ZZ <- ZZ + t(matrix(rep(p$mean, N), ncol=N)) matplot(XX, t(ZZ), col="gray", lwd=0.5, lty=1, type="l", xlab="x", ylab="f-hat(x)", bty="n") points(X, Z, pch=19, cex=2) } ## update with four more points X2 <- matrix(c(pi/2, 3*pi/2, -0.5, 2*pi+0.5), ncol=1) Z2 <- sin(X2) updateGP(gpi, X2, Z2) ## make a new set of predictions p2 <- predGP(gpi, XX) if(require(mvtnorm)) { ZZ <- rmvt(N, p2$Sigma, p2$df) ZZ <- ZZ + t(matrix(rep(p2$mean, N), ncol=N)) matplot(XX, t(ZZ), col="gray", lwd=0.5, lty=1, type="l", xlab="x", ylab="f-hat(x)", bty="n") points(X, Z, pch=19, cex=2) points(X2, Z2, pch=19, cex=2, col=2) } ## clean up deleteGP(gpi)
## for more examples, see predGP and mleGP docs ## simple sine data X <- matrix(seq(0,2*pi,length=7), ncol=1) Z <- sin(X) ## new GP fit gpi <- newGP(X, Z, 2, 0.000001) ## make predictions XX <- matrix(seq(-1,2*pi+1, length=499), ncol=ncol(X)) p <- predGP(gpi, XX) ## sample from the predictive distribution if(require(mvtnorm)) { N <- 100 ZZ <- rmvt(N, p$Sigma, p$df) ZZ <- ZZ + t(matrix(rep(p$mean, N), ncol=N)) matplot(XX, t(ZZ), col="gray", lwd=0.5, lty=1, type="l", xlab="x", ylab="f-hat(x)", bty="n") points(X, Z, pch=19, cex=2) } ## update with four more points X2 <- matrix(c(pi/2, 3*pi/2, -0.5, 2*pi+0.5), ncol=1) Z2 <- sin(X2) updateGP(gpi, X2, Z2) ## make a new set of predictions p2 <- predGP(gpi, XX) if(require(mvtnorm)) { ZZ <- rmvt(N, p2$Sigma, p2$df) ZZ <- ZZ + t(matrix(rep(p2$mean, N), ncol=N)) matplot(XX, t(ZZ), col="gray", lwd=0.5, lty=1, type="l", xlab="x", ylab="f-hat(x)", bty="n") points(X, Z, pch=19, cex=2) points(X2, Z2, pch=19, cex=2, col=2) } ## clean up deleteGP(gpi)
Uses a surrogate modeled augmented Lagrangian (AL) system to optimize an objective function (blackbox or known and linear) under unknown multiple (blackbox) constraints via expected improvement (EI) and variations; a comparator based on EI with constraints is also provided
optim.auglag(fn, B, fhat = FALSE, equal = FALSE, ethresh = 1e-2, slack = FALSE, cknown = NULL, start = 10, end = 100, Xstart = NULL, sep = TRUE, ab = c(3/2, 8), lambda = 1, rho = NULL, urate = 10, ncandf = function(t) { t }, dg.start = c(0.1, 1e-06), dlim = sqrt(ncol(B)) * c(1/100, 10), Bscale = 1, ey.tol = 1e-2, N = 1000, plotprog = FALSE, verb = 2, ...) optim.efi(fn, B, fhat = FALSE, cknown = NULL, start = 10, end = 100, Xstart = NULL, sep = TRUE, ab = c(3/2,8), urate = 10, ncandf = function(t) { t }, dg.start = c(0.1, 1e-6), dlim = sqrt(ncol(B))*c(1/100,10), Bscale = 1, plotprog = FALSE, verb = 2, ...)
optim.auglag(fn, B, fhat = FALSE, equal = FALSE, ethresh = 1e-2, slack = FALSE, cknown = NULL, start = 10, end = 100, Xstart = NULL, sep = TRUE, ab = c(3/2, 8), lambda = 1, rho = NULL, urate = 10, ncandf = function(t) { t }, dg.start = c(0.1, 1e-06), dlim = sqrt(ncol(B)) * c(1/100, 10), Bscale = 1, ey.tol = 1e-2, N = 1000, plotprog = FALSE, verb = 2, ...) optim.efi(fn, B, fhat = FALSE, cknown = NULL, start = 10, end = 100, Xstart = NULL, sep = TRUE, ab = c(3/2,8), urate = 10, ncandf = function(t) { t }, dg.start = c(0.1, 1e-6), dlim = sqrt(ncol(B))*c(1/100,10), Bscale = 1, plotprog = FALSE, verb = 2, ...)
fn |
function of an input ( |
B |
2-column |
fhat |
a scalar logical indicating if the objective function should
be modeled with a GP surrogate. The default of |
equal |
an optional vector containing zeros and ones, whose length equals the number of
constraints, specifying which should be treated as equality constraints ( |
ethresh |
a threshold used for equality constraints to determine validity for progress measures; ignored if there are no equality constraints |
slack |
A scalar logical indicating if slack variables, and thus exact EI
calculations should be used. The default of |
cknown |
A optional positive integer vector specifying which of the constraint
values returned by |
start |
positive integer giving the number of random starting locations before
sequential design (for optimization) is performed; |
end |
positive integer giving the total number of evaluations/trials in the
optimization; must have |
Xstart |
optional matrix of starting design locations in lieu of, or in addition to,
|
sep |
The default |
ab |
prior parameters; see |
lambda |
|
rho |
positive scalar initial quadratic penalty parameter in the augmented Lagrangian; the default setting of |
urate |
positive integer indicating how many optimization trials should pass before each MLE/MAP update is performed for GP correlation lengthscale parameter(s) |
ncandf |
function taking a single integer indicating the optimization trial number |
dg.start |
2-vector giving starting values for the lengthscale and nugget parameters of the GP surrogate model(s) for constraints |
dlim |
2-vector giving bounds for the lengthscale parameter(s) under MLE/MAP inference,
thereby augmenting the prior specification in |
Bscale |
scalar indicating the relationship between the sum of the inputs, |
ey.tol |
a scalar proportion indicating how many of the EIs
at |
N |
positive scalar integer indicating the number of Monte Carlo samples to be used for calculating EI and EY |
plotprog |
|
verb |
a non-negative integer indicating the verbosity level; the larger the value the more that is printed to the screen |
... |
additional arguments passed to |
These subroutines support a suite of methods used to optimize challenging constrained problems from Gramacy, et al. (2016); and from Picheny, et al., (2016) see references below.
Those schemes hybridize Gaussian process based surrogate modeling and expected
improvement (EI; Jones, et., al, 2008) with the additive penalty method (APM)
implemented by the augmented Lagrangian (AL, e.g., Nocedal & Wright, 2006).
The goal is to minimize a (possibly known) linear objective function f(x)
under
multiple, unknown (blackbox) constraint functions satisfying c(x) <= 0
,
which is vector-valued. The solution here emulates the components of c
with Gaussian process surrogates, and guides optimization by searching the
posterior mean surface, or the EI of, the following composite objective
where and
follow updating equations that
guarantee convergence to a valid solution minimizing the objective. For more
details, see Gramacy, et al. (2016).
A slack variable implementation that allows for exact EI calculations and can accommodate equality constraints, and mixed (equality and inequality) constraints, is also provided. For further details, see Picheny, et al. (2016).
The example below illustrates a variation on the toy problem considered in both papers, which bases sampling on EI. For examples making used of equality constraints, following the Picheny, et al (2016) papers; see the demos listed in the “See Also” section below.
Although it is off by default, these functions allow an unknown objective to
be modeled (fhat = TRUE
), rather than assuming a known linear one. For examples see
demo("ALfhat")
and demo("GSBP")
which illustrate the AL and comparators
in inequality and mixed constraints setups, respectively.
The optim.efi
function is provided as a comparator. This method uses
the same underlying GP models to with the hybrid EI and probability of satisfying
the constraints heuristic from Schonlau, et al., (1998). See demo("GSBP")
and demo("LAH")
for optim.efi
examples and comparisons between
the original AL, the slack variable enhancement(s) on mixed constraint
problems with known and blackbox objectives, respectively
The output is a list
summarizing the progress of the evaluations of the
blackbox under optimization
prog |
vector giving the best valid ( |
obj |
vector giving the value of the objective for the input under consideration at each trial |
X |
|
C |
|
d |
|
df |
if |
lambda |
a |
rho |
a vector of |
This function is under active development, especially the newest features
including separable GP surrogate modeling, surrogate modeling of a
blackbox objective, and the use of slack variables for exact EI calculations and
the support if equality constraints. Also note that, compared with earlier versions, it is now
required to augment your blackbox function (fn
) with an argument named
known.only
. This allows the user to specify if a potentially different
object
(with a subset of the outputs, those that are “known”) gets returned in
certain circumstances. For example, the objective is treated as known in many of our
examples. When a non-null cknown
object is used, the known.only
flag can be used to return only the outputs which are known.
Older versions of this function provided an argument called nomax
.
The NoMax feature is no longer supported
Robert B. Gramacy [email protected]
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 7.) https://bobby.gramacy.com/surrogates/
Picheny, V., Gramacy, R.B., Wild, S.M., Le Digabel, S. (2016). “Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian”. Preprint available on arXiv:1605.09466; https://arxiv.org/abs/1605.09466
Gramacy, R.B, Gray, G.A, Lee, H.K.H, Le Digabel, S., Ranjan P., Wells, G., Wild, S.M. (2016) “Modeling an Augmented Lagrangian for Improved Blackbox Constrained Optimization”, Technometrics (with discussion), 58(1), 1-11. Preprint available on arXiv:1403.4890; https://arxiv.org/abs/1403.4890
Jones, D., Schonlau, M., and Welch, W. J. (1998). “Efficient Global Optimization of Expensive Black Box Functions.” Journal of Global Optimization, 13, 455-492.
Schonlau, M., Jones, D.R., and Welch, W. J. (1998). “Global Versus Local Search in Constrained Optimization of Computer Models.” In New Developments and Applications in Experimental Design, vol. 34, 11-25. Institute of Mathematical Statistics.
Nocedal, J. and Wright, S.J. (2006). Numerical Optimization. 2nd ed. Springer.
vignette("laGP")
, demo("ALfhat")
for blackbox objective,
demo("GSBP")
for a mixed constraints problem with blackbox objective,
demo("LAH")
for mix constraints with known objective,
optim.step.tgp
for unconstrained optimization;
optim
with method="L-BFGS-B"
for box constraints, or
optim
with method="SANN"
for simulated annealing
## this example assumes a known linear objective; further examples ## are in the optim.auglag demo ## a test function returning linear objective evaluations and ## non-linear constraints aimprob <- function(X, known.only = FALSE) { if(is.null(nrow(X))) X <- matrix(X, nrow=1) f <- rowSums(X) if(known.only) return(list(obj=f)) c1 <- 1.5-X[,1]-2*X[,2]-0.5*sin(2*pi*(X[,1]^2-2*X[,2])) c2 <- rowSums(X^2)-1.5 return(list(obj=f, c=cbind(c1,c2))) } ## set bounding rectangle for adaptive sampling B <- matrix(c(rep(0,2),rep(1,2)),ncol=2) ## optimization (primarily) by EI, change 25 to 100 for ## 99% chance of finding the global optimum with value 0.6 if(require("interp")) { ## for plotprog=interp out <- optim.auglag(aimprob, B, end=25, plotprog=interp) } else { out <- optim.auglag(aimprob, B, end=25) } ## using the slack variable implementation which is a little slower ## but more precise; slack=2 augments random search with L-BFGS-B out2 <- optim.auglag(aimprob, B, end=25, slack=TRUE) ## Not run: out3 <- optim.auglag(aimprob, B, end=25, slack=2) ## End(Not run) ## for more slack examples and comparison to optim.efi on problems ## involving equality and mixed (equality and inequality) constraints, ## see demo("ALfhat"), demo("GSBP") and demo("LAH") ## for comparison, here is a version that uses simulated annealing ## with the Additive Penalty Method (APM) for constraints ## Not run: aimprob.apm <- function(x, B=matrix(c(rep(0,2),rep(1,2)),ncol=2)) { ## check bounding box for(i in 1:length(x)) { if(x[i] < B[i,1] || x[i] > B[i,2]) return(Inf) } ## evaluate objective and constraints f <- sum(x) c1 <- 1.5-x[1]-2*x[2]-0.5*sin(2*pi*(x[1]^2-2*x[2])) c2 <- x[1]^2+x[2]^2-1.5 ## return APM composite return(f + abs(c1) + abs(c2)) } ## use SA; specify control=list(maxit=100), say, to control max ## number of iterations; does not easily facilitate plotting progress out4 <- optim(runif(2), aimprob.apm, method="SANN") ## check the final value, which typically does not satisfy both ## constraints aimprob(out4$par) ## End(Not run) ## for a version with a modeled objective see demo("ALfhat")
## this example assumes a known linear objective; further examples ## are in the optim.auglag demo ## a test function returning linear objective evaluations and ## non-linear constraints aimprob <- function(X, known.only = FALSE) { if(is.null(nrow(X))) X <- matrix(X, nrow=1) f <- rowSums(X) if(known.only) return(list(obj=f)) c1 <- 1.5-X[,1]-2*X[,2]-0.5*sin(2*pi*(X[,1]^2-2*X[,2])) c2 <- rowSums(X^2)-1.5 return(list(obj=f, c=cbind(c1,c2))) } ## set bounding rectangle for adaptive sampling B <- matrix(c(rep(0,2),rep(1,2)),ncol=2) ## optimization (primarily) by EI, change 25 to 100 for ## 99% chance of finding the global optimum with value 0.6 if(require("interp")) { ## for plotprog=interp out <- optim.auglag(aimprob, B, end=25, plotprog=interp) } else { out <- optim.auglag(aimprob, B, end=25) } ## using the slack variable implementation which is a little slower ## but more precise; slack=2 augments random search with L-BFGS-B out2 <- optim.auglag(aimprob, B, end=25, slack=TRUE) ## Not run: out3 <- optim.auglag(aimprob, B, end=25, slack=2) ## End(Not run) ## for more slack examples and comparison to optim.efi on problems ## involving equality and mixed (equality and inequality) constraints, ## see demo("ALfhat"), demo("GSBP") and demo("LAH") ## for comparison, here is a version that uses simulated annealing ## with the Additive Penalty Method (APM) for constraints ## Not run: aimprob.apm <- function(x, B=matrix(c(rep(0,2),rep(1,2)),ncol=2)) { ## check bounding box for(i in 1:length(x)) { if(x[i] < B[i,1] || x[i] > B[i,2]) return(Inf) } ## evaluate objective and constraints f <- sum(x) c1 <- 1.5-x[1]-2*x[2]-0.5*sin(2*pi*(x[1]^2-2*x[2])) c2 <- x[1]^2+x[2]^2-1.5 ## return APM composite return(f + abs(c1) + abs(c2)) } ## use SA; specify control=list(maxit=100), say, to control max ## number of iterations; does not easily facilitate plotting progress out4 <- optim(runif(2), aimprob.apm, method="SANN") ## check the final value, which typically does not satisfy both ## constraints aimprob(out4$par) ## End(Not run) ## for a version with a modeled objective see demo("ALfhat")
Perform Gaussian processes prediction (under isotropic or separable formulation)
at new XX
locations using a GP object stored on the C-side
predGP(gpi, XX, lite = FALSE, nonug = FALSE) predGPsep(gpsepi, XX, lite = FALSE, nonug = FALSE)
predGP(gpi, XX, lite = FALSE, nonug = FALSE) predGPsep(gpsepi, XX, lite = FALSE, nonug = FALSE)
gpi |
a C-side GP object identifier (positive integer);
e.g., as returned by |
gpsepi |
similar to |
XX |
a |
lite |
a scalar logical indicating whether ( |
nonug |
a scalar logical indicating if a (nonzero) nugget should be used in the predictive
equations; this allows the user to toggle between visualizations of uncertainty due just
to the mean, and a full quantification of predictive uncertainty. The latter (default |
Returns the parameters of Student-t predictive equations. By
default, these include a full predictive covariance matrix between all
XX
locations. However, this can be slow when nrow(XX)
is large, so a lite
options is provided, which only returns the
diagonal of that matrix.
GP prediction is sometimes called “kriging”, especially in the spatial statistics literature. So this function could also be described as returning evaluations of the “kriging equations”
The output is a list
with the following components.
mean |
a vector of predictive means of length |
Sigma |
covariance matrix
for a multivariate Student-t distribution; alternately
if |
df |
a Student-t degrees of freedom scalar (applies to all
|
Robert B. Gramacy [email protected]
For standard GP prediction, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning, or
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 5.) https://bobby.gramacy.com/surrogates/
vignette("laGP")
,
newGP
, mleGP
, jmleGP
,
## a "computer experiment" -- a much smaller version than the one shown ## in the aGP docs ## Simple 2-d test function used in Gramacy & Apley (2015); ## thanks to Lee, Gramacy, Taddy, and others who have used it before f2d <- function(x, y=NULL) { if(is.null(y)) { if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## design with N=441 x <- seq(-2, 2, length=11) X <- expand.grid(x, x) Z <- f2d(X) ## fit a GP gpi <- newGP(X, Z, d=0.35, g=1/1000) ## predictive grid with NN=400 xx <- seq(-1.9, 1.9, length=20) XX <- expand.grid(xx, xx) ZZ <- f2d(XX) ## predict p <- predGP(gpi, XX, lite=TRUE) ## RMSE: compare to similar experiment in aGP docs sqrt(mean((p$mean - ZZ)^2)) ## visualize the result par(mfrow=c(1,2)) image(xx, xx, matrix(p$mean, nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="predictive mean") image(xx, xx, matrix(p$mean-ZZ, nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="bas") ## clean up deleteGP(gpi) ## see the newGP and mleGP docs for examples using lite = FALSE for ## sampling from the joint predictive distribution
## a "computer experiment" -- a much smaller version than the one shown ## in the aGP docs ## Simple 2-d test function used in Gramacy & Apley (2015); ## thanks to Lee, Gramacy, Taddy, and others who have used it before f2d <- function(x, y=NULL) { if(is.null(y)) { if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## design with N=441 x <- seq(-2, 2, length=11) X <- expand.grid(x, x) Z <- f2d(X) ## fit a GP gpi <- newGP(X, Z, d=0.35, g=1/1000) ## predictive grid with NN=400 xx <- seq(-1.9, 1.9, length=20) XX <- expand.grid(xx, xx) ZZ <- f2d(XX) ## predict p <- predGP(gpi, XX, lite=TRUE) ## RMSE: compare to similar experiment in aGP docs sqrt(mean((p$mean - ZZ)^2)) ## visualize the result par(mfrow=c(1,2)) image(xx, xx, matrix(p$mean, nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="predictive mean") image(xx, xx, matrix(p$mean-ZZ, nrow=length(xx)), col=heat.colors(128), xlab="x1", ylab="x2", main="bas") ## clean up deleteGP(gpi) ## see the newGP and mleGP docs for examples using lite = FALSE for ## sampling from the joint predictive distribution
Generate two-dimensional random paths (one-dimensional manifolds in 2d) comprising of different randomly chosen line types: linear, quadratic, cubic, exponential, and natural logarithm. If the input dimensionality is higher than 2, then a line in two randomly chosen input coordinates is generated
randLine(a, D, N, smin, res)
randLine(a, D, N, smin, res)
a |
a fixed two-element vector denoting the range of the bounding box (lower bound and upper bound) of all input coordinates |
D |
a scalar denoting the dimensionality of input space |
N |
a scalar denoting the desired total number of random lines |
smin |
a scalar denoting the minimum absolute scaling constant, i.e., the length of the shortest line that could be generated |
res |
a scalar denoting the number of data points, i.e., the resolution on the random path |
This two-dimensional random line generating function produces different
types of 2d
random paths, including linear, quadratic, cubic,
exponential, and natural logarithm.
First, one of these line types is chosen uniformly at random. The line is
then drawn, via a collection of discrete points, from the origin according
to the arguments, e.g., resolution and length, provided by the user. The
discrete set of coordinates are then shifted and scaled, uniformly at
random, into the specified 2d rectangle, e.g., , with
the restriction that at least half of the points comprising the line lie
within the rectangle.
For a quick visualization, see Figure 15 in Sun, et al. (2017). Figure 7
in the same manuscript illustrates the application of this function in
out-of-sample prediction using laGPsep
, in 2d
and 4d
, respectively.
randLine
returns different types of random paths and the indices of
the randomly selected pair, i.e., subset, of input coordinates (when D > 2
).
randLine
returns a list
of list
s. The outer list is of
length six, representing each of the five possible line types (linear, quadratic,
cubic, exponential, and natural logarithm), with the sixth entry providing the
randomly chosen input dimensions.
The inner list
s are comprised of
data.frame
s, the number of which span N
samples across all
inner list
s.
Users should scale each coordinate of global input space to the same coded
range, e.g., , in order to avoid computational burden
caused by passing global input space argument. Users may convert back to the
natural units when necessary.
Furong Sun [email protected] and Robert B. Gramacy [email protected]
F. Sun, R.B. Gramacy, B. Haaland, E. Lawrence, and A. Walker (2019). Emulating satellite drag from large simulation experiments, SIAM/ASA Journal on Uncertainty Quantification, 7(2), pp. 720-759; preprint on arXiv:1712.00182; https://arxiv.org/abs/1712.00182
## 1. visualization of the randomly generated paths ## generate the paths D <- 4 a <- c(-2, 2) N <- 30 smin <- 0.1 res <- 100 line.set <- randLine(a=a, D=D, N=N, smin=smin, res=res) ## the indices of the randomly selected pair of input coordinates d <- line.set$d ## visualization ## first create an empty plot par(mar=c(5, 4, 6, 2) + 0.1) plot(0, xlim=a, ylim=a, type="l", xlab=paste("factor ", d[1], sep=""), ylab=paste("factor ", d[2], sep=""), main="2d random paths", cex.lab=1.5, cex.main=2) abline(h=(a[1]+a[2])/2, v=(a[1]+a[2])/2, lty=2) ## merge each path type together W <- unlist(list(line.set$lin, line.set$qua, line.set$cub, line.set$ep, line.set$ln), recursive=FALSE) ## calculate colors to retain n <- unlist(lapply(line.set, length)[-6]) cols <- rep(c("orange", "blue", "forestgreen", "magenta", "cornflowerblue"), n) ## plot randomly generated paths with a centering dot in red at the midway point for(i in 1:N){ lines(W[[i]][,1], W[[i]][,2], col=cols[i]) points(W[[i]][res/2,1], W[[i]][res/2,2], col=2, pch=20) } ## add legend legend("top", legend=c("lin", "qua", "cub", "exp", "log"), cex=1.5, bty="n", xpd=TRUE, horiz=TRUE, inset=c(0, -0.085), lty=rep(1, 5), lwd=rep(1, 5), col=c("orange", "blue", "forestgreen", "magenta", "cornflowerblue")) ## 2. use the random paths for out-of-sample prediction via laGPsep ## test function (same 2d function as in other examples package) ## (ignoring 4d nature of path generation above) f2d <- function(x, y=NULL){ if(is.null(y)){ if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## generate training data using 2d input space x <- seq(a[1], a[2], by=0.02) X <- as.matrix(expand.grid(x, x)) Y <- f2d(X) ## example of joint path calculation folowed by RMSE calculation ## on the first random path WW <- W[[sample(1:N, 1)]] WY <- f2d(WW) ## exhaustive search via ``joint" ALC j.exh <- laGPsep(WW, 6, 100, X, Y, method="alcopt", close=10000, lite=FALSE) sqrt(mean((WY - j.exh$mean)^2)) ## RMSE ## repeat for all thirty path elements (way too slow for checking) and other ## local design choices and visualize RMSE distribution(s) side-by-side ## Not run: ## pre-allocate to save RMSE rmse.exh <- rmse.opt <- rmse.nn <- rmse.pw <- rmse.pwnn <- rep(NA, N) for(t in 1:N){ WW <- W[[t]] WY <- f2d(WW) ## joint local design exhaustive search via ALC j.exh <- laGPsep(WW, 6, 100, X, Y, method="alc", close=10000, lite=FALSE) rmse.exh[t] <- sqrt(mean((WY - j.exh$mean)^2)) ## joint local design gradient-based search via ALC j.opt <- laGPsep(WW, 6, 100, X, Y, method="alcopt", close=10000, lite=FALSE) rmse.opt[t] <- sqrt(mean((WY - j.opt$mean)^2)) ## joint local design exhaustive search via NN j.nn <- laGPsep(WW, 6, 100, X, Y, method="nn", close=10000, lite=FALSE) rmse.nn[t] <- sqrt(mean((WY - j.nn$mean)^2)) ## pointwise local design via ALC pw <- aGPsep(X, Y, WW, start=6, end=50, d=list(max=20), method="alc", verb=0) rmse.pw[t] <- sqrt(mean((WY - pw$mean)^2)) ## pointwise local design via NN pw.nn <- aGPsep(X, Y, WW, start=6, end=50, d=list(max=20), method="nn", verb=0) rmse.pwnn[t] <- sqrt(mean((WY - pw.nn$mean)^2)) ## progress meter print(t) } ## justify the y range ylim_RMSE <- log(range(rmse.exh, rmse.opt, rmse.nn, rmse.pw, rmse.pwnn)) ## plot the distribution of RMSE output boxplot(log(rmse.exh), log(rmse.opt), log(rmse.nn), log(rmse.pw), log(rmse.pwnn), xaxt='n', xlab="", ylab="log(RMSE)", ylim=ylim_RMSE, main="") axis(1, at=1:5, labels=c("ALC-ex", "ALC-opt", "NN", "ALC-pw", "NN-pw"), las=1) ## End(Not run)
## 1. visualization of the randomly generated paths ## generate the paths D <- 4 a <- c(-2, 2) N <- 30 smin <- 0.1 res <- 100 line.set <- randLine(a=a, D=D, N=N, smin=smin, res=res) ## the indices of the randomly selected pair of input coordinates d <- line.set$d ## visualization ## first create an empty plot par(mar=c(5, 4, 6, 2) + 0.1) plot(0, xlim=a, ylim=a, type="l", xlab=paste("factor ", d[1], sep=""), ylab=paste("factor ", d[2], sep=""), main="2d random paths", cex.lab=1.5, cex.main=2) abline(h=(a[1]+a[2])/2, v=(a[1]+a[2])/2, lty=2) ## merge each path type together W <- unlist(list(line.set$lin, line.set$qua, line.set$cub, line.set$ep, line.set$ln), recursive=FALSE) ## calculate colors to retain n <- unlist(lapply(line.set, length)[-6]) cols <- rep(c("orange", "blue", "forestgreen", "magenta", "cornflowerblue"), n) ## plot randomly generated paths with a centering dot in red at the midway point for(i in 1:N){ lines(W[[i]][,1], W[[i]][,2], col=cols[i]) points(W[[i]][res/2,1], W[[i]][res/2,2], col=2, pch=20) } ## add legend legend("top", legend=c("lin", "qua", "cub", "exp", "log"), cex=1.5, bty="n", xpd=TRUE, horiz=TRUE, inset=c(0, -0.085), lty=rep(1, 5), lwd=rep(1, 5), col=c("orange", "blue", "forestgreen", "magenta", "cornflowerblue")) ## 2. use the random paths for out-of-sample prediction via laGPsep ## test function (same 2d function as in other examples package) ## (ignoring 4d nature of path generation above) f2d <- function(x, y=NULL){ if(is.null(y)){ if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## generate training data using 2d input space x <- seq(a[1], a[2], by=0.02) X <- as.matrix(expand.grid(x, x)) Y <- f2d(X) ## example of joint path calculation folowed by RMSE calculation ## on the first random path WW <- W[[sample(1:N, 1)]] WY <- f2d(WW) ## exhaustive search via ``joint" ALC j.exh <- laGPsep(WW, 6, 100, X, Y, method="alcopt", close=10000, lite=FALSE) sqrt(mean((WY - j.exh$mean)^2)) ## RMSE ## repeat for all thirty path elements (way too slow for checking) and other ## local design choices and visualize RMSE distribution(s) side-by-side ## Not run: ## pre-allocate to save RMSE rmse.exh <- rmse.opt <- rmse.nn <- rmse.pw <- rmse.pwnn <- rep(NA, N) for(t in 1:N){ WW <- W[[t]] WY <- f2d(WW) ## joint local design exhaustive search via ALC j.exh <- laGPsep(WW, 6, 100, X, Y, method="alc", close=10000, lite=FALSE) rmse.exh[t] <- sqrt(mean((WY - j.exh$mean)^2)) ## joint local design gradient-based search via ALC j.opt <- laGPsep(WW, 6, 100, X, Y, method="alcopt", close=10000, lite=FALSE) rmse.opt[t] <- sqrt(mean((WY - j.opt$mean)^2)) ## joint local design exhaustive search via NN j.nn <- laGPsep(WW, 6, 100, X, Y, method="nn", close=10000, lite=FALSE) rmse.nn[t] <- sqrt(mean((WY - j.nn$mean)^2)) ## pointwise local design via ALC pw <- aGPsep(X, Y, WW, start=6, end=50, d=list(max=20), method="alc", verb=0) rmse.pw[t] <- sqrt(mean((WY - pw$mean)^2)) ## pointwise local design via NN pw.nn <- aGPsep(X, Y, WW, start=6, end=50, d=list(max=20), method="nn", verb=0) rmse.pwnn[t] <- sqrt(mean((WY - pw.nn$mean)^2)) ## progress meter print(t) } ## justify the y range ylim_RMSE <- log(range(rmse.exh, rmse.opt, rmse.nn, rmse.pw, rmse.pwnn)) ## plot the distribution of RMSE output boxplot(log(rmse.exh), log(rmse.opt), log(rmse.nn), log(rmse.pw), log(rmse.pwnn), xaxt='n', xlab="", ylab="log(RMSE)", ylim=ylim_RMSE, main="") axis(1, at=1:5, labels=c("ALC-ex", "ALC-opt", "NN", "ALC-pw", "NN-pw"), las=1) ## End(Not run)