ISTA
The implementation of ista follows that outlined in Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1), 183–202. https://doi.org/10.1137/080716542 see Remark 3.1 on p. 191 (ISTA with backtracking)
GIST can be found in Gong, P., Zhang, C., Lu, Z., Huang, J., & Ye, J. (2013). A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems. Proceedings of the 30th International Conference on Machine Learning, 28(2)(2), 37–45.
fitIsta
We provide two optimizer interfaces: One uses a combination of arma::rowvec and less::stringVector for starting values and parameter labels respectively. This interface is consistent with the fit and gradient function of the less::model
-class. Alternatively, a numericVector can be passed to the optimizers. This design is rooted in the use of Rcpp::NumericVectors that combine values and labels similar to an R vector. Thus, interfacing to this second function call can be easier when coming from R.
Version 1
- param userModel: your model. Must inherit from less::model!
- param startingValues: numericVector with initial starting values. This vector can have names.
- param penalty: vector with strings indicating the penalty for each parameter. Currently supported are "none", "cappedL1", "lasso", "lsp", "mcp", and "scad". (e.g., {"none", "scad", "scad", "lasso", "none"}). If only one value is provided, the same penalty will be applied to every parameter!
- param lambda: lambda tuning parameter values. One lambda value for each parameter. If only one value is provided, this value will be applied to each parameter. Important: The the function will not loop over these values but assume that you may want to provide different levels of regularization for each parameter!
- param theta: theta tuning parameter values. One theta value for each parameter If only one value is provided, this value will be applied to each parameter. Not all penalties use theta. Important: The the function will not loop over these values but assume that you may want to provide different levels of regularization for each parameter!
- param controlOptimizer: option to change the optimizer settings
- param verbose: should additional information be printed? If set > 0, additional information will be provided. Highly recommended for initial runs. Note that the optimizer itself has a separate verbose argument that can be used to print information on each iteration. This can be set with the controlOptimizer - argument.
- return fitResults
Version 2
- param userModel: your model. Must inherit from less::model!
- param startingValues: an arma::rowvec numeric vector with starting values
- param parameterLabels: a less::stringVector with labels for parameters
- param penalty: vector with strings indicating the penalty for each parameter. Currently supported are "none", "cappedL1", "lasso", "lsp", "mcp", and "scad". (e.g., {"none", "scad", "scad", "lasso", "none"}). If only one value is provided, the same penalty will be applied to every parameter!
- param lambda: lambda tuning parameter values. One lambda value for each parameter. If only one value is provided, this value will be applied to each parameter. Important: The the function will not loop over these values but assume that you may want to provide different levels of regularization for each parameter!
- param theta: theta tuning parameter values. One theta value for each parameter If only one value is provided, this value will be applied to each parameter. Not all penalties use theta. Important: The the function will not loop over these values but assume that you may want to provide different levels of regularization for each parameter!
- param controlOptimizer: option to change the optimizer settings
- param verbose: should additional information be printed? If set > 0, additional information will be provided. Highly recommended for initial runs. Note that the optimizer itself has a separate verbose argument that can be used to print information on each iteration. This can be set with the controlOptimizer - argument.
- return fitResults
ista
Version 1
Implements (variants of) the ista optimizer.
- param model_: the model object derived from the model class in model.h
- param startingValuesRcpp: an Rcpp numeric vector with starting values
- param proximalOperator_: a proximal operator for the penalty function
- param penalty_: a penalty derived from the penalty class in penalty.h
- param smoothPenalty_: a smooth penalty derived from the smoothPenalty class in smoothPenalty.h
- param tuningParameters: tuning parameters for the penalty function
- param smoothTuningParameters: tuning parameters for the smooth penalty function
- param control_: settings for the ista optimizer.
- return fit result
Version 2
Implements (variants of) the ista optimizer.
- param model_: the model object derived from the model class in model.h
- param startingValues: an arma::rowvec numeric vector with starting values
- param parameterLabels: a less::stringVector with labels for parameters
- param proximalOperator_: a proximal operator for the penalty function
- param penalty_: a penalty derived from the penalty class in penalty.h
- param smoothPenalty_: a smooth penalty derived from the smoothPenalty class in smoothPenalty.h
- param tuningParameters: tuning parameters for the penalty function
- param smoothTuningParameters: tuning parameters for the smooth penalty function
- param control_: settings for the ista optimizer.
- return fit result
controlDefault
Returns default for the optimizer settings
Optimizer settings
The ista optimizer has the following additional settings:
L0
: adouble
controling the step size used in the first iterationeta
: adouble
controling by how much the step size changes in inner iterations with $(\eta^i)*L$, where $i$ is the inner iterationaccelerate
: abool
; if the extrapolation parameter is used to accelerate ista (see, e.g., Parikh, N., & Boyd, S. (2013). Proximal Algorithms. Foundations and Trends in Optimization, 1(3), 123–231., p. 152)maxIterOut
: anint
specifying the maximal number of outer iterationsmaxIterIn
: anint
specifying the maximal number of inner iterationsbreakOuter
: adouble
specyfing the stopping criterion for outer iterationsbreakInner
: adouble
specyfing the stopping criterion for inner iterationsconvCritInner
: aconvCritInnerIsta
that specifies the inner breaking condition. Can be set toless::istaCrit
(see Beck & Teboulle (2009); Remark 3.1 on p. 191 (ISTA with backtracking)) orless::gistCrit
(see Gong et al., 2013; Equation 3)sigma
: adouble
in (0,1) that is used by the gist convergence criterion. Larger sigma enforce larger improvement in fitstepSizeIn
: astepSizeInheritance
that specifies how step sizes should be carried forward from iteration to iteration.less::initial
: resets the step size to L0 in each iteration,less::istaStepInheritance
: takes the previous step size as initial value for the next iteration,less::barzilaiBorwein
: uses the Barzilai-Borwein procedure,less::stochasticBarzilaiBorwein
: uses the Barzilai-Borwein procedure, but sometimes resets the step size; this can help when the optimizer is caught in a bad spot.sampleSize
: anint
that can be used to scale the fitting function down if the fitting function depends on the sample sizeverbose
: anint
, where 0 prints no additional information, > 0 prints GLMNET iterations
convCritInnerIsta
Convergence criteria used by the ista optimizer.
- value istaCrit: The approximated fit based on the quadratic approximation h(parameters_k) := fit(parameters_k) + (parameters_k-parameters_kMinus1)gradients_k^T + (L/2)(parameters_k-parameters_kMinus1)^2 + penalty(parameters_k) is compared to the exact fit
- value gistCrit: the exact fit is compared to h(parameters_k) := fit(parameters_k) + penalty(parameters_kMinus1) + L(sigma/2)(parameters_k-parameters_kMinus1)^2
stepSizeInheritance
The ista optimizer provides different rules to be used to find an initial step size. It defines if and how the step size should be carried forward from iteration to iteration.
- value initial: resets the step size to L0 in each iteration
- value istaStepInheritance: takes the previous step size as initial value for the next iteration
- value barzilaiBorwein: uses the Barzilai-Borwein procedure
- value stochasticBarzilaiBorwein: uses the Barzilai-Borwein procedure, but sometimes resets the step size; this can help when the optimizer is caught in a bad spot.
Penalties
CappedL1
tuningParametersCappedL1
Tuning parameters for the cappedL1 penalty using ista
- param lambda: lambda value >= 0
- param alpha: alpha value of the elastic net (relative importance of ridge and lasso)
- param weights: provide parameter-specific weights (e.g., for adaptive lasso)
- param theta: threshold parameter; any parameter above this threshold will only receive the constant penalty lambda_itheta, all below will get lambda_iparameterValue_i
proximalOperatorCappedL1
Proximal operator for the cappedL1 penalty function
penaltyCappedL1
CappedL1 penalty for ista
The penalty function is given by: $$p( x_j) = \lambda \min(| x_j|, \theta)$$ where $\theta > 0$. The cappedL1 penalty is identical to the lasso for parameters which are below $\theta$ and identical to a constant for parameters above $\theta$. As adding a constant to the fitting function will not change its minimum, larger parameters can stay unregularized while smaller ones are set to zero.
CappedL1 regularization:
- Zhang, T. (2010). Analysis of Multi-stage Convex Relaxation for Sparse Regularization. Journal of Machine Learning Research, 11, 1081–1107.
lasso
tuningParametersEnet
Tuning parameters for the lasso penalty using ista
- param lambda: lambda value >= 0
- param alpha: alpha value of the elastic net (relative importance of ridge and lasso)
- param weights: provide parameter-specific weights (e.g., for adaptive lasso)
proximalOperatorLasso
Proximal operator for the lasso penalty function
penaltyLASSO
Lasso penalty for ista
The penalty function is given by: $$p( x_j) = \lambda |x_j|$$ Lasso regularization will set parameters to zero if $\lambda$ is large enough
Lasso regularization:
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
LSP
tuningParametersLSP
Tuning parameters for the lsp penalty using ista
- param weights: provide parameter-specific weights (e.g., for adaptive lasso)
- param lambda: lambda value >= 0
- param theta: theta value of the lsp penalty > 0
proximalOperatorLSP
Proximal operator for the lsp penalty function
penaltyLSP
Lsp penalty for ista
The penalty function is given by: $$p( x_j) = \lambda \log(1 + |x_j|/\theta)$$ where $\theta > 0$.
lsp regularization:
- Candès, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing Sparsity by Reweighted l1 Minimization. Journal of Fourier Analysis and Applications, 14(5–6), 877–905. https://doi.org/10.1007/s00041-008-9045-x
MCP
tuningParametersMcp
Tuning parameters for the mcp penalty optimized with ista
- param weights: provide parameter-specific weights (e.g., for adaptive lasso)
- param lambda: lambda value >= 0
- param theta: theta value of the cappedL1 penalty > 0
proximalOperatorMcp
Proximal operator for the mcp penalty function
penaltyMcp
Mcp penalty for ista
The penalty function is given by: $$p( x_j) = \begin{cases} \lambda |x_j| - x_j^2/(2\theta) & \text{if } |x_j| \leq \theta\lambda\ \theta\lambda^2/2 & \text{if } |x_j| > \lambda\theta \end{cases}$$ where $\theta > 1$.
mcp regularization:
- Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942. https://doi.org/10.1214/09-AOS729
Mixed penalty
tuningParametersMixedPenalty
Tuning parameters for the mixed penalty optimized with glmnet
- param penaltyType_: penaltyType-vector specifying the penalty to be used for each parameter
- param lambda: provide parameter-specific lambda values
- param theta: theta value of the mixed penalty > 0
- param alpha: alpha value of the mixed penalty > 0
- param weights: provide parameter-specific weights (e.g., for adaptive lasso)
proximalOperatorMixedPenalty
Proximal operator for the mixed penalty function
penaltyMixedPenalty
Mixed penalty
Ridge
tuningParametersEnet
Tuning parameters for the lasso penalty using ista
- param lambda: lambda value >= 0
- param alpha: alpha value of the elastic net (relative importance of ridge and lasso)
- param weights: provide parameter-specific weights (e.g., for adaptive lasso)
penaltyRidge
Ridge penalty for ista
The penalty function is given by: $$p( x_j) = \lambda x_j^2$$ Note that ridge regularization will not set any of the parameters to zero but result in a shrinkage towards zero.
Ridge regularization:
- Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55–67. https://doi.org/10.1080/00401706.1970.10488634
SCAD
tuningParametersScad
Tuning parameters for the scad penalty optimized with ista
- param weights: provide parameter-specific weights (e.g., for adaptive lasso)
- param lambda: lambda value >= 0
- param theta: theta value of the cappedL1 penalty > 0
proximalOperatorScad
Proximal operator for the scad penalty function
penaltyScad
Scad penalty for ista
The penalty function is given by: $$p( x_j) = \begin{cases} \lambda |x_j| & \text{if } |x_j| \leq \theta\ \frac{-x_j^2 + 2\theta\lambda |x_j| - \lambda^2}{2(\theta -1)} & \text{if } \lambda < |x_j| \leq \lambda\theta \ (\theta + 1) \lambda^2/2 & \text{if } |x_j| \geq \theta\lambda\ $$ where $\theta > 2$.
scad regularization:
- Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360. https://doi.org/10.1198/016214501753382273