General-Purpose-Optimization
Source:vignettes/General-Purpose-Optimization.Rmd
General-Purpose-Optimization.Rmd
lessSEM can be used for regularized SEM and for general purpose optimization. That is, you can use all optimizers and penalty functions implemented in lessSEM for your own models. To this end, you must define a fitting function; i.e., a function which takes in the parameters and returns a single value - the unregularized fit. lessSEM uses this fitting function and adds the penalty terms. The combined fitting function is then optimized. Currently, there are four ways to use the optimizers in lessSEM
- You can use the R interface. This interface is very similar to that
of optim (see e.g.,
?lessSEM::gpLasso
). - If your functions are defined in C++, you can use a faster interface
which is a bit more involved (see e.g.,
?lessSEM::gpLassoCpp
). - You can include the header files of lessSEM in your package to directly interface to the underlying C++ functions. This is the most complicated approach.
- The optimizers are implemented in the separate C++ header only library lesstimate that can be used as a submodule in R packages.
In general, the approaches get faster as you transition from 1 to 4. You will see the largest performance gains when implementing a gradient function and not just a fitting function, however. As a rule of thumb: Use approach 1 if you intend to run your model a few times, don’t want to create a new package and your model runs fairly fast. Use approach 2 if you want to increase the speed a bit, while keeping the changes necessary to your files manageable. Use approach 3 or 4 if you create a new package, have some experience with RcppArmadillo and want to get the best performance.
In the following, we will demonstrate all three approaches using a linear regression model as an example.
The example
Let’s start by setting up our linear regression model. To this end, we will simulate a data set:
The first approach: Interfacing from R
We will now try to implement a lasso regularized linear regression
using the gpLasso interface. This interface is very similar to
optim
. To use it, we must define our fitting function in
R:
# defining the sum-squared-errors:
sseFun <- function(par, y, X, N){
# par is the parameter vector
# y is the observed dependent variable
# X is the design matrix
# N is the sample size
pred <- X %*% matrix(par, ncol = 1) #be explicit here:
# we need par to be a column vector
sse <- sum((y - pred)^2)
# we scale with .5/N to get the same results as glmnet
return((.5/N)*sse)
}
Additionally, we need a labeled vector with starting values:
par <- rep(0, p+1)
names(par) <- paste0("b", 0:p)
print(par)
#> b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10
#> 0 0 0 0 0 0 0 0 0 0 0
Note that we defined one more parameter than there are variables in X. This is because we also want to estimate the intercept. To this end, we extend X:
Xext <- cbind(1,X)
head(Xext)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 -0.56047565 -0.71040656 2.1988103 -0.7152422 -0.07355602 -0.60189285
#> [2,] 1 -0.23017749 0.25688371 1.3124130 -0.7526890 -1.16865142 -0.99369859
#> [3,] 1 1.55870831 -0.24669188 -0.2651451 -0.9385387 -0.63474826 1.02678506
#> [4,] 1 0.07050839 -0.34754260 0.5431941 -1.0525133 -0.02884155 0.75106130
#> [5,] 1 0.12928774 -0.95161857 -0.4143399 -0.4371595 0.67069597 -1.50916654
#> [6,] 1 1.71506499 -0.04502772 -0.4762469 0.3311792 -1.65054654 -0.09514745
#> [,8] [,9] [,10] [,11]
#> [1,] 1.07401226 -0.7282191 0.3562833 -1.0141142
#> [2,] -0.02734697 -1.5404424 -0.6580102 -0.7913139
#> [3,] -0.03333034 -0.6930946 0.8552022 0.2995937
#> [4,] -1.51606762 0.1188494 1.1529362 1.6390519
#> [5,] 0.79038534 -1.3647095 0.2762746 1.0846170
#> [6,] -0.21073418 0.5899827 0.1441047 -0.6245675
Finally, we need to decide which parameters should be regularized and the values for lambda. We want to regularize everything except for the intercept:
(regularized <- paste0("b", 1:p))
#> [1] "b1" "b2" "b3" "b4" "b5" "b6" "b7" "b8" "b9" "b10"
lambdas <- seq(0,.1,length.out = 20)
Now, we are ready to estimate the model:
library(lessSEM)
l1 <- gpLasso(par = par,
regularized = regularized,
fn = sseFun,
lambdas = lambdas,
X = Xext,
y = y,
N = length(y)
)
head(l1@parameters)
#> lambda alpha theta b0 b1 b2 b3 b4
#> 1 0.000000000 1 0 0.02738463 1.0129197 0.9991454 0.9705726 1.027626
#> 2 0.005263158 1 0 0.02935332 1.0043734 0.9908935 0.9626257 1.025138
#> 3 0.010526316 1 0 0.02995012 0.9967094 0.9846670 0.9552794 1.021892
#> 4 0.015789474 1 0 0.03010662 0.9897328 0.9789426 0.9481493 1.018672
#> 5 0.021052632 1 0 0.03029718 0.9827287 0.9732055 0.9409870 1.015363
#> 6 0.026315789 1 0 0.03112519 0.9753355 0.9670621 0.9338615 1.011553
#> b5 b6 b7 b8 b9 b10
#> 1 0.014036259 -0.007461001 0.0185899756 0.021930761 -0.009900029 0.027401040
#> 2 0.003365776 0.000000000 0.0143410112 0.015434958 -0.007939546 0.022297680
#> 3 0.000000000 0.000000000 0.0096219826 0.010707808 -0.005256709 0.017465071
#> 4 0.000000000 0.000000000 0.0049334068 0.006364151 -0.002393543 0.012713199
#> 5 0.000000000 0.000000000 0.0001769463 0.002036748 0.000000000 0.007969729
#> 6 0.000000000 0.000000000 0.0000000000 0.000000000 0.000000000 0.003304089
Note that we did not specify the gradients of our function. In this case, lessSEM will use numDeriv to compute the gradients. However, if you know how to specify the gradients, this can result in faster estimation:
sseGrad <- function(par, y, X, N){
gradients = (-2.0*t(X) %*% y + 2.0*t(X)%*%X%*%matrix(par,ncol = 1))
gradients = (.5/length(y))*gradients
return(t(gradients))
}
l1 <- gpLasso(par = par,
regularized = regularized,
fn = sseFun,
gr = sseGrad,
lambdas = lambdas,
X = Xext,
y = y,
N = length(y)
)
head(l1@parameters)
#> lambda alpha theta b0 b1 b2 b3 b4
#> 1 0.000000000 1 0 0.02738538 1.0129192 0.9991446 0.9705722 1.027627
#> 2 0.005263158 1 0 0.02935293 1.0043730 0.9908934 0.9626257 1.025138
#> 3 0.010526316 1 0 0.02995044 0.9967092 0.9846670 0.9552790 1.021892
#> 4 0.015789474 1 0 0.03010657 0.9897326 0.9789426 0.9481492 1.018672
#> 5 0.021052632 1 0 0.03029714 0.9827286 0.9732060 0.9409871 1.015363
#> 6 0.026315789 1 0 0.03112488 0.9753367 0.9670621 0.9338616 1.011553
#> b5 b6 b7 b8 b9 b10
#> 1 0.014035495 -0.007458693 0.0185899698 0.021930012 -0.009901646 0.027400755
#> 2 0.003365787 0.000000000 0.0143411586 0.015434965 -0.007939552 0.022297718
#> 3 0.000000000 0.000000000 0.0096216331 0.010707813 -0.005256789 0.017465164
#> 4 0.000000000 0.000000000 0.0049331390 0.006364195 -0.002393728 0.012713246
#> 5 0.000000000 0.000000000 0.0001771743 0.002036275 0.000000000 0.007969899
#> 6 0.000000000 0.000000000 0.0000000000 0.000000000 0.000000000 0.003303889
Here is a short comparison of running both models 5 times each:
Runtime in seconds without gradients:
#> [1] 0.2564266 0.2470605 0.2468219 0.2499166 0.2482591
Runtime in seconds with gradients:
#> [1] 0.02321720 0.02883840 0.02212071 0.02288485 0.02416110
That’s quite a speedup!
Note that you can also pass a C++ function to gpLasso similar to the approach above:
library(RcppArmadillo)
library(Rcpp)
linreg <- '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
double fitfunction(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N){
// compute the sum of squared errors:
arma::mat sse = arma::trans(y-X*parameters)*(y-X*parameters);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
sse *= 1.0/(2.0 * N);
// note: We must return a double, but the sse is a matrix
// To get a double, just return the single value that is in
// this matrix:
return(sse(0,0));
}
// [[Rcpp::export]]
arma::rowvec gradientfunction(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N){
// note: we want to return our gradients as row-vector; therefore,
// we have to transpose the resulting column-vector:
arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*parameters);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
gradients *= (.5/N);
return(gradients);
}'
Rcpp::sourceCpp(code = linreg)
Run the model as before:
l1 <- gpLasso(par = par,
regularized = regularized,
fn = fitfunction,
gr = gradientfunction,
lambdas = lambdas,
X = Xext,
y = y,
N = length(y)
)
head(l1@parameters)
#> lambda alpha theta b0 b1 b2 b3 b4
#> 1 0.000000000 1 0 0.02738582 1.0129189 0.9991449 0.9705722 1.027626
#> 2 0.005263158 1 0 0.02935312 1.0043727 0.9908927 0.9626260 1.025139
#> 3 0.010526316 1 0 0.02995020 0.9967090 0.9846669 0.9552793 1.021892
#> 4 0.015789474 1 0 0.03010671 0.9897324 0.9789426 0.9481494 1.018672
#> 5 0.021052632 1 0 0.03029732 0.9827286 0.9732061 0.9409868 1.015363
#> 6 0.026315789 1 0 0.03112459 0.9753371 0.9670621 0.9338616 1.011553
#> b5 b6 b7 b8 b9 b10
#> 1 0.014035802 -0.007459445 0.0185900832 0.021930312 -0.009899840 0.027400855
#> 2 0.003365641 0.000000000 0.0143415484 0.015434674 -0.007939592 0.022297256
#> 3 0.000000000 0.000000000 0.0096218807 0.010707664 -0.005256972 0.017464975
#> 4 0.000000000 0.000000000 0.0049329762 0.006364249 -0.002393580 0.012713275
#> 5 0.000000000 0.000000000 0.0001769284 0.002036489 0.000000000 0.007970293
#> 6 0.000000000 0.000000000 0.0000000000 0.000000000 0.000000000 0.003304421
The runtime in seconds with C++ is:
#> [1] 0.01719499 0.02304626 0.01618743 0.01625514 0.01627016
Which is even lower than what we had before!
The second approach: Using C++ function pointers
While using the Rcpp functions defined above was quite fast for our linear regression, it can still be fairly slow for more involved models (e.g., SEM). This is due to our optimizer having to go back and forth between R and C++. To reduce this overhead, we can use the second approach. Here, instead of passing an Rcpp function which is then executed in R, we pass a pointer to the underlying C++ functions. This approach is more constrained than the one presented above:
- We must define both, a fitting function and a gradient function in Rcpp. We cannot rely on numDeriv any more!
- The fitting function and the gradient function are only allowed two
parameters each: a
const Rcpp::NumericVector&
(the parameters) and anRcpp::List&
(everything else). While this seems restrictive, note that we can virtually pass anything we want in a list. - We must create pointers to the fit and gradient function. This is difficult, however we will provide some guidance below.
This may be a bit overwhelming at first, so we will go through it step by step.
1. Creating a fitting function and a gradient function
We already defined a fitting function and a gradient function for our linear regression model in the example above. However, we often do not know the gradients in closed form. If you don’t have a gradient function, you can try a numerical approximation. More details can be found here.
2. Adapting the functions to the constraints
Note that our fitting function and our gradient function do not
comply with the constraints mentioned above. That is, they do take more
than two parameters as arguments
(const arma::colvec parameters, const arma::mat X, const arma::colvec y, const int N
),
and these arguments are not a
const Rcpp::NumericVector&
and an
Rcpp::List&
. How can we make this work? The parameter
vector const Rcpp::NumericVector&
will hold all
elements in the arma::colvec pararameters
of our old
function. The Rcpp::List&
must contain all of the other
elements (X,y,N
). Let’s start by creating this list, which
we will call data:
Next, we have to change our functions to make things work:
linreg <- '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
double fitfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
// our function now only takes the two specified arguments: a
// const Rcpp::NumericVector& and an Rcpp::List&.
// We have to extract all elements from the list:
arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
int N = Rcpp::as<int>(data["N"]); // the sample size
// Next, we want to get the parameters as a column-vector:
arma::colvec b = Rcpp::as<arma::colvec>(parameters);
// compute the sum of squared errors:
arma::mat sse = arma::trans(y-X*b)*(y-X*b);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
sse *= 1.0/(2.0 * N);
// note: We must return a double, but the sse is a matrix
// To get a double, just return the single value that is in
// this matrix:
return(sse(0,0));
}
// [[Rcpp::export]]
arma::rowvec gradientfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
// our function now only takes the two specified arguments: a
// const Rcpp::NumericVector& and an Rcpp::List&.
// We have to extract all elements from the list:
arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
int N = Rcpp::as<int>(data["N"]); // the sample size
// Next, we want to get the parameters as a column-vector:
arma::colvec b = Rcpp::as<arma::colvec>(parameters);
// note: we want to return our gradients as row-vector; therefore,
// we have to transpose the resulting column-vector:
arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*b);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
gradients *= (.5/N);
return(gradients);
}
'
That’s it, our functions have been transformed!
Step 3: Creating pointers to our functions
This is where it get’s really tricky! We can’t just pass our functions to C++. However, we can create pointers. These have to be generated in C++ and this can be tricky to get right. To simplify the process, we have created a function which helps setting things up:
cat(lessSEM::makePtrs(fitFunName = "fitfunction", # name of the function in C++
gradFunName = "gradientfunction" # name of the function in C++
)
)
#>
#> // INSTRUCTIONS: ADD THE FOLLOWING LINES TO YOUR C++ FUNCTIONS
#>
#> // IF RCPPARMADILLO IS NOT IMPORTED YET, UNCOMMENT THE FOLLOWING TWO LINES
#> // // [[Rcpp::depends(RcppArmadillo)]]
#> // #include <RcppArmadillo.h>
#>
#> // Dirk Eddelbuettel at
#> // https://gallery.rcpp.org/articles/passing-cpp-function-pointers/
#>
#> typedef double (*fitFunPtr)(const Rcpp::NumericVector&, //parameters
#> Rcpp::List& //additional elements
#> );
#> typedef Rcpp::XPtr<fitFunPtr> fitFunPtr_t;
#>
#> typedef arma::rowvec (*gradientFunPtr)(const Rcpp::NumericVector&, //parameters
#> Rcpp::List& //additional elements
#> );
#> typedef Rcpp::XPtr<gradientFunPtr> gradientFunPtr_t;
#>
#> // [[Rcpp::export]]
#> fitFunPtr_t fitfunctionPtr() {
#> return(fitFunPtr_t(new fitFunPtr(&fitfunction)));
#> }
#>
#> // [[Rcpp::export]]
#> gradientFunPtr_t gradientfunctionPtr() {
#> return(gradientFunPtr_t(new gradientFunPtr(&gradientfunction)));
#> }
Let’s follow the instructions and add the lines to our C++ functions:
linreg <- '
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
double fitfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
// our function now only takes the two specified arguments: a
// const Rcpp::NumericVector& and an Rcpp::List&.
// We have to extract all elements from the list:
arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
int N = Rcpp::as<int>(data["N"]); // the sample size
// Next, we want to get the parameters as a column-vector:
arma::colvec b = Rcpp::as<arma::colvec>(parameters);
// compute the sum of squared errors:
arma::mat sse = arma::trans(y-X*b)*(y-X*b);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
sse *= 1.0/(2.0 * N);
// note: We must return a double, but the sse is a matrix
// To get a double, just return the single value that is in
// this matrix:
return(sse(0,0));
}
// [[Rcpp::export]]
arma::rowvec gradientfunction(const Rcpp::NumericVector& parameters, Rcpp::List& data){
// our function now only takes the two specified arguments: a
// const Rcpp::NumericVector& and an Rcpp::List&.
// We have to extract all elements from the list:
arma::colvec y = Rcpp::as<arma::colvec>(data["y"]); // the dependent variable
arma::mat X = Rcpp::as<arma::mat>(data["X"]); // the design matrix
int N = Rcpp::as<int>(data["N"]); // the sample size
// Next, we want to get the parameters as a column-vector:
arma::colvec b = Rcpp::as<arma::colvec>(parameters);
// note: we want to return our gradients as row-vector; therefore,
// we have to transpose the resulting column-vector:
arma::rowvec gradients = arma::trans(-2.0*X.t() * y + 2.0*X.t()*X*b);
// other packages, such as glmnet, scale the sse with
// 1/(2*N), where N is the sample size. We will do that here as well
gradients *= (.5/N);
return(gradients);
}
/// THE FOLLOWING PART IS NEW:
// INSTRUCTIONS: ADD THE FOLLOWING LINES TO YOUR C++ FUNCTIONS
// IF RCPPARMADILLO IS NOT IMPORTED YET, UNCOMMENT THE FOLLOWING TWO LINES
// // [[Rcpp::depends(RcppArmadillo)]]
// #include <RcppArmadillo.h>
// Dirk Eddelbuettel at
// https://gallery.rcpp.org/articles/passing-cpp-function-pointers/
typedef double (*fitFunPtr)(const Rcpp::NumericVector&, //parameters
Rcpp::List& //additional elements
);
typedef Rcpp::XPtr<fitFunPtr> fitFunPtr_t;
typedef arma::rowvec (*gradientFunPtr)(const Rcpp::NumericVector&, //parameters
Rcpp::List& //additional elements
);
typedef Rcpp::XPtr<gradientFunPtr> gradientFunPtr_t;
// [[Rcpp::export]]
fitFunPtr_t fitfunctionPtr() {
return(fitFunPtr_t(new fitFunPtr(&fitfunction)));
}
// [[Rcpp::export]]
gradientFunPtr_t gradientfunctionPtr() {
return(gradientFunPtr_t(new gradientFunPtr(&gradientfunction)));
}
'
Compile the functions using Rcpp:
Rcpp::sourceCpp(code = linreg)
Great! Now that this is out of the way, we can create the pointers to our functions:
ffp <- fitfunctionPtr() # create the pointer to the fitting function
# Note that the name of this function will depend on the name of your fitting function.
# For instance, if your fitting function is called sse, then the pointer will be created
# with ffp <- ssePtr()
gfp <- gradientfunctionPtr() # create the pointer to the gradient function
# Note that the name of this function will depend on the name of your gradient function.
# For instance, if your gradient function is called sseGradient, then the pointer will be created
# with gfp <- sseGradientPtr()
Optimizing the model
The last step is to call the general purpose optimization. To this
end, use the gpLassoCpp
function:
l1 <- gpLassoCpp(par = par,
regularized = regularized,
# important: pass the poinnters!
fn = ffp,
gr = gfp,
lambdas = lambdas,
# finally, pass the list which the fitting function and the
# gradient function need:
additionalArguments = data
)
head(l1@parameters)
#> lambda alpha theta b0 b1 b2 b3 b4
#> 1 0.000000000 1 0 0.02738583 1.0129200 0.9991446 0.9705725 1.027626
#> 2 0.005263158 1 0 0.02935279 1.0043739 0.9908927 0.9626259 1.025139
#> 3 0.010526316 1 0 0.02995027 0.9967093 0.9846669 0.9552792 1.021892
#> 4 0.015789474 1 0 0.03010682 0.9897334 0.9789425 0.9481493 1.018673
#> 5 0.021052632 1 0 0.03029725 0.9827287 0.9732059 0.9409868 1.015363
#> 6 0.026315789 1 0 0.03112477 0.9753367 0.9670621 0.9338617 1.011553
#> b5 b6 b7 b8 b9 b10
#> 1 0.014034458 -0.007459281 0.0185900689 0.021931006 -0.009900374 0.027400852
#> 2 0.003365898 0.000000000 0.0143413737 0.015434574 -0.007939162 0.022297256
#> 3 0.000000000 0.000000000 0.0096218864 0.010707576 -0.005256811 0.017464871
#> 4 0.000000000 0.000000000 0.0049333060 0.006363934 -0.002393508 0.012713075
#> 5 0.000000000 0.000000000 0.0001771637 0.002036388 0.000000000 0.007969876
#> 6 0.000000000 0.000000000 0.0000000000 0.000000000 0.000000000 0.003303873
Benchmarking this approach results in:
#> [1] 0.01327848 0.01320338 0.01323175 0.01311016 0.01310015
So, we have reduced our runtime even more!
The third and fourth approach: Including the header files
This approach requires a more elaborate setup which is why we have
created a whole package to demonstrate it. You will find more
information in the vignette The-optimizer-interface
and in
the lessLM package. If
you just want the optimizers and don’t want to depend on the
lessSEM package, we recommend that you copy the lesstimate C++ library
in your packages inst/include folder.
It will come to the same parameter estimates:
#> b0 b1 b2 b3 b4 b5 b6
#> [1,] 0.02734701 1.0129361 0.9991629 0.9705501 1.027728 0.013993181 -0.007491533
#> [2,] 0.02939675 1.0043635 0.9908681 0.9626493 1.025035 0.003400965 0.000000000
#> [3,] 0.02998680 0.9967117 0.9846504 0.9552960 1.021803 0.000000000 0.000000000
#> [4,] 0.03006777 0.9897315 0.9789595 0.9481301 1.018774 0.000000000 0.000000000
#> [5,] 0.03032345 0.9827556 0.9731799 0.9409662 1.015441 0.000000000 0.000000000
#> [6,] 0.03111085 0.9753325 0.9670615 0.9338506 1.011607 0.000000000 0.000000000
#> b7 b8 b9 b10
#> [1,] 0.0186210155 0.021974963 -0.009975776 0.027466466
#> [2,] 0.0143089363 0.015388336 -0.007860992 0.022231196
#> [3,] 0.0095927088 0.010679371 -0.005183552 0.017406513
#> [4,] 0.0049636831 0.006397664 -0.002474334 0.012779758
#> [5,] 0.0001374354 0.002100692 0.000000000 0.008033475
#> [6,] 0.0000000000 0.000000000 0.000000000 0.003352431
And the run times are even lower:
#> [1] 0.002040863 0.001126051 0.001076460 0.001072645 0.001070738