regCtsem
regCtsem.Rd
Regularized Continuous Time Structural Equation Modeling.
regCtsem uses objects from ctsemOMX and implements lasso, adaptive lasso and ridge regularization.
Usage
regCtsem(
ctsemObject,
dataset,
regIndicators,
targetVector = NULL,
lambdas = "auto",
lambdasAutoLength = 50,
lambdasAutoCurve = 10,
penalty = "lasso",
adaptiveLassoWeights = NULL,
adaptiveLassoPower = -1,
cvSample = NULL,
autoCV = "No",
k = 5,
sparseParameters = NULL,
subjectSpecificParameters = NULL,
standardizeDrift = "No",
scaleLambdaWithN = TRUE,
returnFitIndices = TRUE,
BICWithNAndT = FALSE,
optimization = "exact",
optimizer = "GIST",
control = list(),
verbose = 0,
trainingWheels = TRUE
)
Arguments
- ctsemObject
Fitted object of class ctsemFit
- dataset
Data set in wide format compatible with ctsemOMX
- regIndicators
Labels of the regularized parameters (e.g. drift_eta1_eta2).
- targetVector
named vector with values towards which the parameters are regularized (Standard is regularization towards zero)
- lambdas
vector of penalty values (tuning parameter). E.g., seq(0,1,.01). Alternatively, lambdas can be set to "auto". regCtsem will then compute an upper limit for lambda and test lambdasAutoLength increasing lambda values
- lambdasAutoLength
if lambdas == "auto", lambdasAutoLength will determine the number of lambdas tested.
- lambdasAutoCurve
It is often a good idea to have unequally spaced lambda steps (e.g., .01,.02,.05,1,5,20). If lambdasAutoCurve is close to 1 lambda values will be equally spaced, if lambdasAutoCurve is large lambda values will be more concentrated close to 0. See ?getCurvedLambda for more informations.
- penalty
Currently supported are lasso, ridge and adaptiveLasso
- adaptiveLassoWeights
weights for the adaptive lasso. Defaults to 1/(|theta|^adaptiveLassoPower), where theta is the maximum likelihood estimate of the regularized parameters.
- adaptiveLassoPower
power for the adaptive lasso weights. The weights will be set to 1/(|theta|^adaptiveLassoPower).
- cvSample
cross-validation sample. Has to be in wide format and compatible with ctsemOMX
- autoCV
Should automatic cross-validation be used? Possible are "No", "kFold" or "Blocked". kFold splits the dataset in k groups by selecting independent units from the rows. Blocked is a within-unit split, where for each person blocks of observations are deleted. See Bulteel, K., Mestdagh, M., Tuerlinckx, F., & Ceulemans, E. (2018). VAR(1) based models do not always outpredict AR(1) models in typical psychological applications. Psychological Methods, 23(4), 740-756. https://doi.org/10.1037/met0000178 for a more detailed explanation
- k
number of cross-validation folds if autoCV = "kFold" or autoCV = "Blocked"
- sparseParameters
labeled vector with parameter estimates of the most sparse model. If regValues = "auto" the sparse parameters will be computed automatically.
- subjectSpecificParameters
EXPERIMENTAL! A vector of parameter labels for parameters which should be estimated person-specific. If these parameter labels are also passed to regIndicators, all person-specific parameters will be regularized towards a group-parameter. This is a 2-step-procedure: In step 1 all parameters are constrained to equality between individuals to estimate the group parameters. In step 2 the parameters are estimated person-specific, but regularized towards the group parameter from step 1.
- standardizeDrift
Should Drift parameters be standardized automatically? Set to 'No' for no standardization, 'T0VAR' for standardization using the T0VAR or 'asymptoticDiffusion' for standardization using the asymptotic diffusion
- scaleLambdaWithN
Boolean: Should the penalty value be scaled with the sample size? True is recommended as the likelihood is also sample size dependent
- returnFitIndices
Boolean: should fit indices be returned?
- BICWithNAndT
Boolean: TRUE = Use N and T in the formula for the BIC (-2log L + log(N+T)*k, where k is the number of parameters in the model). FALSE = Use N in the formula for the BIC (-2log L + log(N)). Defaults to FALSE
- optimization
which optimization procedure should be used. Possible are "exact" or "approx". exact is recommended for sparsity inducing penalty functions (lasso and adaptive lasso)
- optimizer
for exact optimization: Either GIST or GLMNET. When using optimization = "approx", Rsolnp or any of the optimizers in optimx can be used. See ?optimx
- control
List with control arguments for the optimizer. See ?controlGIST, ?controlGLMNET and ?controlApprox for the respective parameters
- verbose
0 (default), 1 for convergence plot, 2 for parameter convergence plot and line search progress.
- trainingWheels
If set to FALSE all bells and whistles used to keep regCtsem on track are turned off (no multiple starting values, no initial optimization with solnp or optimx). The focus is speed instead of accuracy. This might work in simulated data, but is NOT recommended with real data. The optimizer is quite likely to get stuck in local minima.
Value
returns an object of class regCtsem. Without cross-validation, this object will have the fields setup (all arguments passed to the function), fitAndParameters (used internally to store the fit and the raw (i.e., untransformed) parameters), fit (fit indices, ect.), parameterEstimatesRaw (raw, i.e. untransformed parameters; used internally), and parameters (transformed parameters)
Details
REGularized Continuous Time Structural Equation Models (regCtsem) implements least absolute shrinkage and selection operator (LASSO; Tibshirani, 1996) and adaptive LASSO (Zou, 2006) regularization for models fitted with ctsemOMX (Driver et al., 2017). See vignette("regCtsem", package = "regCtsem") for an introduction and additional examples.
References:
* Driver, C. C., Oud, J. H. L., & Voelkle, M. C. (2017). Continuous Time Structural Equation Modelling With R Package ctsem. Journal of Statistical Software, 77(5), 1-36. https://doi.org/10.18637/jss.v077.i05
* Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288.
* Zou, H. (2006). The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101(476), 1418-1429. https://doi.org/10.1198/016214506000000735
NOTE: Function located in file regCtsem.R
Examples
# \donttest{
set.seed(17046)
library(regCtsem)
#### Example 1 ####
## Regularization with FIML objective function
## Population model:
# set the drift matrix. Note that drift eta_1_eta2 is set to equal 0 in the population.
ct_drift <- matrix(c(-.3,.2,0,-.5), ncol = 2)
generatingModel<-ctsem::ctModel(Tpoints=10,n.latent=2,n.TDpred=0,
n.TIpred=0,n.manifest=2,
MANIFESTVAR=diag(0,2),
LAMBDA=diag(1,2),
DRIFT=ct_drift,
DIFFUSION=matrix(c(.5,0,0,.5),2),
CINT=matrix(c(0,0),nrow=2),
T0MEANS=matrix(0,ncol=1,nrow=2),
T0VAR=diag(1,2), type = "omx")
# simulate a training data set
dat <- ctsem::ctGenerate(generatingModel, n.subjects = 100, wide = TRUE)
## Build the analysis model. Note that drift eta1_eta2 is freely estimated
# although it is 0 in the population.
myModel <- ctsem::ctModel(Tpoints=10,n.latent=2,n.TDpred=0,
n.TIpred=0,n.manifest=2,
LAMBDA=diag(1,2),
MANIFESTVAR=diag(0,2),
CINT=matrix(c(0,0),nrow=2),
DIFFUSION=matrix(c('eta1_eta1',0,0,'eta2_eta2'),2),
T0MEANS=matrix(0,ncol=1,nrow=2),
T0VAR="auto", type = "omx")
# fit the model using ctsemOMX:
fit_myModel <- ctsemOMX::ctFit(dat, myModel)
# select DRIFT values for regularization:
# Note: If you are unsure what the parameters are called in
# your model, check: showParameters(fit_myModel)
showParameters(fit_myModel)
# regularize the cross-effects:
regIndicators <- c("drift_eta2_eta1", "drift_eta1_eta2")
# Optimize model using GIST with lasso penalty
regModel <- regCtsem(ctsemObject = fit_myModel,
dataset = dat,
regIndicators = regIndicators,
lambdas = "auto",
lambdasAutoLength = 20)
summary(regModel)
plot(regModel)
plot(regModel, what = "fit", criterion = c("AIC", "BIC", "m2LL"))
# The best parameter estimates and the final model as mxObject can be extracted with:
# getFinalParameters(regCtsemObject = regModel, criterion = "BIC")
# bestModel <- getFinalModel(regCtsemObject = regModel, criterion = "BIC")
# WARNING: The returned model is of type cpptsem. You can access it's elements with the
# $ operator. For example: bestModel$DRIFTValues
# WARNING: If you load an existing regCtsem object, the underlying C++ model will no longer
# exist. You can restore this model with restore(). Example:
# save(regModel, file = "regModel.RData")
# load("regModel.RData")
# regModel <- restore(regModel)
# regModel$setup$cpptsemObject$DRIFTValues
# Optimize model using GLMNET with lasso penalty
regModel <- regCtsem(ctsemObject = fit_myModel,
dataset = dat,
regIndicators = regIndicators,
lambdas = "auto",
lambdasAutoLength = 20,
optimizer = "GLMNET")
summary(regModel, criterion = "BIC")
plot(regModel, what = "drift")
plot(regModel, what = "fit", criterion = c("AIC", "BIC", "m2LL"))
plot(regModel, what = "drift_eta1_eta2")
# The same regularization can be performed with the approximate optimization:
regModelApprox <- regCtsem(ctsemObject = fit_myModel,
dataset = dat,
regIndicators = regIndicators,
lambdas = "auto",
lambdasAutoLength = 20,
optimization = "approx",
control = list(
epsilon = .001, # epsilon is used to transform the non-differentiable
#lasso penalty to a differentiable one if optimization = approx
zeroThresh = .04 # threshold below which parameters will be evaluated as == 0
))
# Comparison of parameter estimates:
round(regModel$fitAndParameters - regModelApprox$fitAndParameters,4)
#### Example 2 ####
## Regularization with Kalman objective function
set.seed(175446)
## define the population model:
# set the drift matrix. Note that drift eta_1_eta2 is set to equal 0 in the population.
ct_drift <- matrix(c(-.3,.2,0,-.5), ncol = 2)
generatingModel<-ctsem::ctModel(Tpoints=100,n.latent=2,
n.TDpred=0,n.TIpred=0,n.manifest=2,
MANIFESTVAR=diag(0,2),
LAMBDA=diag(1,2),
DRIFT=ct_drift,
DIFFUSION=matrix(c(.5,0,0,.5),2),
CINT=matrix(c(0,0),nrow=2),
T0MEANS=matrix(0,ncol=1,nrow=2),
T0VAR=diag(1,2), type = "omx")
# simulate a training data and testing data set
traindata <- ctsem::ctGenerate(generatingModel,n.subjects = 10, wide = TRUE)
testdata <- ctsem::ctGenerate(generatingModel,n.subjects = 10, wide = TRUE)
## Build the analysis model. Note that drift eta1_eta2 is freely estimated
# although it is 0 in the population.
myModel <- ctsem::ctModel(Tpoints=100,n.latent=2,n.TDpred=0,
n.TIpred=0,n.manifest=2,
LAMBDA=diag(1,2),
MANIFESTVAR=diag(0,2),
CINT=matrix(c(0,0),nrow=2),
DIFFUSION=matrix(c('eta1_eta1',0,0,'eta2_eta2'),2),
T0MEANS=matrix(0,ncol=1,nrow=2),
T0VAR="auto", type = "omx")
fit_myModel <- ctFit(dat = traindata, ctmodelobj = myModel, objective = "Kalman")
# select DRIFT values:
regIndicators <- c("drift_eta2_eta1", "drift_eta1_eta2")
# Note: If you are unsure what the parameters are called in
# your model, check: fit_myModel$ctmodelobj$DRIFT for the drift or
# omxGetParameters(fit_myModel$ctmodelobj) for all parameters
## Optimization with GIST:
regModel <- regCtsem(ctsemObject = fit_myModel,
dataset = traindata,
regIndicators = regIndicators,
lambdas = "auto",
lambdasAutoLength = 20,
cvSample = testdata # data set for cross-validation
)
summary(regModel, criterion = "cvM2LL")
plot(regModel, what = "fit", criterion = "cvM2LL")
#### EXPERIMENTAL FEATURES: USE WITH CAUTION! ####
library(regCtsem)
## Example 4: Kalman Filter with person specific parameter values
## WARNING: THIS WILL TAKE A WHILE TO RUN
set.seed(175446)
## define the population model:
# set the drift matrix. Note that drift eta_1_eta2 is set to equal 0 in the population.
ct_drift <- matrix(c(-.3,0,.2,-.2),2,2,TRUE)
dataset <- c()
indpars <- c()
# We will simulate data for 10 individuals with person-specific parameters
# These person-specific parameters will then be regularized towards a
# common group parameter
for(i in 1:10){
while(TRUE){
DRIFT <- ct_drift + matrix(c(0,stats::rnorm(1,0,.5),0,0),2,2,TRUE)
if(!any(Re(eigen(DRIFT)$values) > 0)){break}
}
indpars <- c(indpars, DRIFT[1,2])
generatingModel<-ctsem::ctModel(Tpoints=50,n.latent=2,n.TDpred=0,n.TIpred=0,n.manifest=2,
MANIFESTVAR=diag(0,2),
LAMBDA=diag(1,2),
DRIFT=DRIFT,
DIFFUSION=matrix(c(.5,0,0,.5),2),
CINT=matrix(0,nrow = 2, ncol = 1),
T0MEANS=matrix(0,ncol=1,nrow=2),
T0VAR=diag(1,2), type = "omx")
dataset <- rbind(dataset, ctsem::ctGenerate(generatingModel,n.subjects = 1, wide = TRUE))
}
## Build the analysis model.
myModel <- ctsem::ctModel(Tpoints=50,n.latent=2,n.TDpred=0,n.TIpred=0,n.manifest=2,
LAMBDA=diag(1,2),
MANIFESTVAR=diag(0,2),
DIFFUSION=matrix(c('eta1_eta1',0,0,'eta2_eta2'),2),
T0MEANS=matrix(0,ncol=1,nrow=2),
T0VAR="auto", type = "omx")
myModel <- ctFit(myModel, dat = dataset, objective = "Kalman",
useOptimizer = TRUE)
regIndicators <- c("drift_eta2_eta1", "drift_eta1_eta2")
# the following parameters will be estimated person-specific and (as we specified this above)
# regularized. The regularization will be towards a group parameter
subjectSpecificParameters <- c("drift_eta2_eta1", "drift_eta1_eta2")
regModel <- regCtsem(ctsemObject = myModel,
dataset = dataset,
regIndicators = regIndicators,
lambdasAutoLength = 5, # 5 will not be enough, but this takes some time to execute
subjectSpecificParameters = subjectSpecificParameters
)
summary(regModel, criterion = "BIC")
plot(regModel, what = "drift")
# }