regCtsem — regCtsem • regCtsem

Regularized Continuous Time Structural Equation Modeling.

regCtsem uses objects from ctsemOMX and implements lasso, adaptive lasso and ridge regularization.

Usage

regCtsem(
  ctsemObject,
  dataset,
  regIndicators,
  targetVector = NULL,
  lambdas = "auto",
  lambdasAutoLength = 50,
  lambdasAutoCurve = 10,
  penalty = "lasso",
  adaptiveLassoWeights = NULL,
  adaptiveLassoPower = -1,
  cvSample = NULL,
  autoCV = "No",
  k = 5,
  sparseParameters = NULL,
  subjectSpecificParameters = NULL,
  standardizeDrift = "No",
  scaleLambdaWithN = TRUE,
  returnFitIndices = TRUE,
  BICWithNAndT = FALSE,
  optimization = "exact",
  optimizer = "GIST",
  control = list(),
  verbose = 0,
  trainingWheels = TRUE
)

Arguments

ctsemObject: Fitted object of class ctsemFit
dataset: Data set in wide format compatible with ctsemOMX
regIndicators: Labels of the regularized parameters (e.g. drift_eta1_eta2).
targetVector: named vector with values towards which the parameters are regularized (Standard is regularization towards zero)
lambdas: vector of penalty values (tuning parameter). E.g., seq(0,1,.01). Alternatively, lambdas can be set to "auto". regCtsem will then compute an upper limit for lambda and test lambdasAutoLength increasing lambda values
lambdasAutoLength: if lambdas == "auto", lambdasAutoLength will determine the number of lambdas tested.
lambdasAutoCurve: It is often a good idea to have unequally spaced lambda steps (e.g., .01,.02,.05,1,5,20). If lambdasAutoCurve is close to 1 lambda values will be equally spaced, if lambdasAutoCurve is large lambda values will be more concentrated close to 0. See ?getCurvedLambda for more informations.
penalty: Currently supported are lasso, ridge and adaptiveLasso
adaptiveLassoWeights: weights for the adaptive lasso. Defaults to 1/(|theta|^adaptiveLassoPower), where theta is the maximum likelihood estimate of the regularized parameters.
adaptiveLassoPower: power for the adaptive lasso weights. The weights will be set to 1/(|theta|^adaptiveLassoPower).
cvSample: cross-validation sample. Has to be in wide format and compatible with ctsemOMX
autoCV: Should automatic cross-validation be used? Possible are "No", "kFold" or "Blocked". kFold splits the dataset in k groups by selecting independent units from the rows. Blocked is a within-unit split, where for each person blocks of observations are deleted. See Bulteel, K., Mestdagh, M., Tuerlinckx, F., & Ceulemans, E. (2018). VAR(1) based models do not always outpredict AR(1) models in typical psychological applications. Psychological Methods, 23(4), 740-756. https://doi.org/10.1037/met0000178 for a more detailed explanation
k: number of cross-validation folds if autoCV = "kFold" or autoCV = "Blocked"
sparseParameters: labeled vector with parameter estimates of the most sparse model. If regValues = "auto" the sparse parameters will be computed automatically.
subjectSpecificParameters: EXPERIMENTAL! A vector of parameter labels for parameters which should be estimated person-specific. If these parameter labels are also passed to regIndicators, all person-specific parameters will be regularized towards a group-parameter. This is a 2-step-procedure: In step 1 all parameters are constrained to equality between individuals to estimate the group parameters. In step 2 the parameters are estimated person-specific, but regularized towards the group parameter from step 1.
standardizeDrift: Should Drift parameters be standardized automatically? Set to 'No' for no standardization, 'T0VAR' for standardization using the T0VAR or 'asymptoticDiffusion' for standardization using the asymptotic diffusion
scaleLambdaWithN: Boolean: Should the penalty value be scaled with the sample size? True is recommended as the likelihood is also sample size dependent
returnFitIndices: Boolean: should fit indices be returned?
BICWithNAndT: Boolean: TRUE = Use N and T in the formula for the BIC (-2log L + log(N+T)*k, where k is the number of parameters in the model). FALSE = Use N in the formula for the BIC (-2log L + log(N)). Defaults to FALSE
optimization: which optimization procedure should be used. Possible are "exact" or "approx". exact is recommended for sparsity inducing penalty functions (lasso and adaptive lasso)
optimizer: for exact optimization: Either GIST or GLMNET. When using optimization = "approx", Rsolnp or any of the optimizers in optimx can be used. See ?optimx
control: List with control arguments for the optimizer. See ?controlGIST, ?controlGLMNET and ?controlApprox for the respective parameters
verbose: 0 (default), 1 for convergence plot, 2 for parameter convergence plot and line search progress.
trainingWheels: If set to FALSE all bells and whistles used to keep regCtsem on track are turned off (no multiple starting values, no initial optimization with solnp or optimx). The focus is speed instead of accuracy. This might work in simulated data, but is NOT recommended with real data. The optimizer is quite likely to get stuck in local minima.

Value

returns an object of class regCtsem. Without cross-validation, this object will have the fields setup (all arguments passed to the function), fitAndParameters (used internally to store the fit and the raw (i.e., untransformed) parameters), fit (fit indices, ect.), parameterEstimatesRaw (raw, i.e. untransformed parameters; used internally), and parameters (transformed parameters)

Details

REGularized Continuous Time Structural Equation Models (regCtsem) implements least absolute shrinkage and selection operator (LASSO; Tibshirani, 1996) and adaptive LASSO (Zou, 2006) regularization for models fitted with ctsemOMX (Driver et al., 2017). See vignette("regCtsem", package = "regCtsem") for an introduction and additional examples.

References:

* Driver, C. C., Oud, J. H. L., & Voelkle, M. C. (2017). Continuous Time Structural Equation Modelling With R Package ctsem. Journal of Statistical Software, 77(5), 1-36. https://doi.org/10.18637/jss.v077.i05

* Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288.

* Zou, H. (2006). The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101(476), 1418-1429. https://doi.org/10.1198/016214506000000735

NOTE: Function located in file regCtsem.R

Author

Jannik Orzek

Jannik Orzek <orzek@mpib-berlin.mpg.de>

Examples

# \donttest{

set.seed(17046)

library(regCtsem)

#### Example 1 ####
## Regularization with FIML objective function

## Population model:

# set the drift matrix. Note that drift eta_1_eta2 is set to equal 0 in the population.
ct_drift <- matrix(c(-.3,.2,0,-.5), ncol = 2)

generatingModel<-ctsem::ctModel(Tpoints=10,n.latent=2,n.TDpred=0,
                                n.TIpred=0,n.manifest=2,
                                MANIFESTVAR=diag(0,2),
                                LAMBDA=diag(1,2),
                                DRIFT=ct_drift,
                                DIFFUSION=matrix(c(.5,0,0,.5),2),
                                CINT=matrix(c(0,0),nrow=2),
                                T0MEANS=matrix(0,ncol=1,nrow=2),
                                T0VAR=diag(1,2), type = "omx")

# simulate a training data set
dat <- ctsem::ctGenerate(generatingModel, n.subjects = 100, wide = TRUE)

## Build the analysis model. Note that drift eta1_eta2 is freely estimated
# although it is 0 in the population.
myModel <- ctsem::ctModel(Tpoints=10,n.latent=2,n.TDpred=0,
                          n.TIpred=0,n.manifest=2,
                          LAMBDA=diag(1,2),
                          MANIFESTVAR=diag(0,2),
                          CINT=matrix(c(0,0),nrow=2),
                          DIFFUSION=matrix(c('eta1_eta1',0,0,'eta2_eta2'),2),
                          T0MEANS=matrix(0,ncol=1,nrow=2),
                          T0VAR="auto", type = "omx")

# fit the model using ctsemOMX:
fit_myModel <- ctsemOMX::ctFit(dat, myModel)

# select DRIFT values for regularization:
# Note: If you are unsure what the parameters are called in
# your model, check: showParameters(fit_myModel)
showParameters(fit_myModel)

# regularize the cross-effects:
regIndicators <- c("drift_eta2_eta1", "drift_eta1_eta2")

# Optimize model using GIST with lasso penalty
regModel <- regCtsem(ctsemObject = fit_myModel,
                               dataset = dat,
                               regIndicators = regIndicators,
                               lambdas = "auto",
                               lambdasAutoLength = 20)
summary(regModel)
plot(regModel)
plot(regModel, what = "fit", criterion = c("AIC", "BIC", "m2LL"))

# The best parameter estimates and the final model as mxObject can be extracted with:
# getFinalParameters(regCtsemObject = regModel, criterion = "BIC")
# bestModel <- getFinalModel(regCtsemObject = regModel, criterion = "BIC")
# WARNING: The returned model is of type cpptsem. You can access it's elements with the
# $ operator. For example: bestModel$DRIFTValues

# WARNING: If you load an existing regCtsem object, the underlying C++ model will no longer
# exist. You can restore this model with restore(). Example:
# save(regModel, file = "regModel.RData")
# load("regModel.RData")
# regModel <- restore(regModel)
# regModel$setup$cpptsemObject$DRIFTValues

# Optimize model using GLMNET with lasso penalty
regModel <- regCtsem(ctsemObject = fit_myModel,
                               dataset = dat,
                               regIndicators = regIndicators,
                               lambdas = "auto",
                               lambdasAutoLength = 20,
                               optimizer = "GLMNET")

summary(regModel, criterion = "BIC")
plot(regModel, what = "drift")
plot(regModel, what = "fit", criterion = c("AIC", "BIC", "m2LL"))
plot(regModel, what = "drift_eta1_eta2")

# The same regularization can be performed with the approximate optimization:
regModelApprox <- regCtsem(ctsemObject = fit_myModel,
                                     dataset = dat,
                                     regIndicators = regIndicators,
                                     lambdas = "auto",
                                     lambdasAutoLength = 20,
                                     optimization = "approx",
                                     control = list(
                                       epsilon = .001, # epsilon is used to transform the non-differentiable
                                       #lasso penalty to a differentiable one if optimization = approx
                                       zeroThresh = .04 # threshold below which parameters will be evaluated as == 0
                                     ))

# Comparison of parameter estimates:
round(regModel$fitAndParameters - regModelApprox$fitAndParameters,4)

#### Example 2 ####
## Regularization with Kalman objective function

set.seed(175446)

## define the population model:

# set the drift matrix. Note that drift eta_1_eta2 is set to equal 0 in the population.
ct_drift <- matrix(c(-.3,.2,0,-.5), ncol = 2)

generatingModel<-ctsem::ctModel(Tpoints=100,n.latent=2,
                                n.TDpred=0,n.TIpred=0,n.manifest=2,
                                MANIFESTVAR=diag(0,2),
                                LAMBDA=diag(1,2),
                                DRIFT=ct_drift,
                                DIFFUSION=matrix(c(.5,0,0,.5),2),
                                CINT=matrix(c(0,0),nrow=2),
                                T0MEANS=matrix(0,ncol=1,nrow=2),
                                T0VAR=diag(1,2), type = "omx")

# simulate a training data and testing data set
traindata <- ctsem::ctGenerate(generatingModel,n.subjects = 10, wide = TRUE)
testdata <- ctsem::ctGenerate(generatingModel,n.subjects = 10, wide = TRUE)

## Build the analysis model. Note that drift eta1_eta2 is freely estimated
# although it is 0 in the population.
myModel <- ctsem::ctModel(Tpoints=100,n.latent=2,n.TDpred=0,
                          n.TIpred=0,n.manifest=2,
                          LAMBDA=diag(1,2),
                          MANIFESTVAR=diag(0,2),
                          CINT=matrix(c(0,0),nrow=2),
                          DIFFUSION=matrix(c('eta1_eta1',0,0,'eta2_eta2'),2),
                          T0MEANS=matrix(0,ncol=1,nrow=2),
                          T0VAR="auto", type = "omx")
fit_myModel <- ctFit(dat = traindata, ctmodelobj = myModel, objective = "Kalman")

# select DRIFT values:
regIndicators <- c("drift_eta2_eta1", "drift_eta1_eta2")
# Note: If you are unsure what the parameters are called in
# your model, check: fit_myModel$ctmodelobj$DRIFT for the drift or
# omxGetParameters(fit_myModel$ctmodelobj) for all parameters

## Optimization with GIST:
regModel <- regCtsem(ctsemObject = fit_myModel,
                               dataset = traindata,
                               regIndicators = regIndicators,
                               lambdas = "auto",
                               lambdasAutoLength = 20,
                               cvSample = testdata # data set for cross-validation
)

summary(regModel, criterion = "cvM2LL")
plot(regModel, what = "fit", criterion = "cvM2LL")

#### EXPERIMENTAL FEATURES: USE WITH CAUTION! ####

library(regCtsem)

## Example 4: Kalman Filter with person specific parameter values
## WARNING: THIS WILL TAKE A WHILE TO RUN
set.seed(175446)

## define the population model:

# set the drift matrix. Note that drift eta_1_eta2 is set to equal 0 in the population.
ct_drift <- matrix(c(-.3,0,.2,-.2),2,2,TRUE)
dataset <- c()
indpars <- c()

# We will simulate data for 10 individuals with person-specific parameters
# These person-specific parameters will then be regularized towards a
# common group parameter
for(i in 1:10){
  while(TRUE){
    DRIFT <- ct_drift + matrix(c(0,stats::rnorm(1,0,.5),0,0),2,2,TRUE)
    if(!any(Re(eigen(DRIFT)$values) > 0)){break}
  }
  indpars <- c(indpars, DRIFT[1,2])
  generatingModel<-ctsem::ctModel(Tpoints=50,n.latent=2,n.TDpred=0,n.TIpred=0,n.manifest=2,
                                  MANIFESTVAR=diag(0,2),
                                  LAMBDA=diag(1,2),
                                  DRIFT=DRIFT,
                                  DIFFUSION=matrix(c(.5,0,0,.5),2),
                                  CINT=matrix(0,nrow = 2, ncol = 1),
                                  T0MEANS=matrix(0,ncol=1,nrow=2),
                                  T0VAR=diag(1,2), type = "omx")
  dataset <- rbind(dataset, ctsem::ctGenerate(generatingModel,n.subjects = 1, wide = TRUE))

}

## Build the analysis model.
myModel <- ctsem::ctModel(Tpoints=50,n.latent=2,n.TDpred=0,n.TIpred=0,n.manifest=2,
                          LAMBDA=diag(1,2),
                          MANIFESTVAR=diag(0,2),
                          DIFFUSION=matrix(c('eta1_eta1',0,0,'eta2_eta2'),2),
                          T0MEANS=matrix(0,ncol=1,nrow=2),
                          T0VAR="auto", type = "omx")
myModel <- ctFit(myModel, dat = dataset, objective = "Kalman",
                 useOptimizer = TRUE)
regIndicators <- c("drift_eta2_eta1", "drift_eta1_eta2")
# the following parameters will be estimated person-specific and (as we specified this above)
# regularized. The regularization will be towards a group parameter
subjectSpecificParameters <- c("drift_eta2_eta1", "drift_eta1_eta2")
regModel <- regCtsem(ctsemObject = myModel,
                     dataset = dataset,
                     regIndicators = regIndicators,
                     lambdasAutoLength = 5, # 5 will not be enough, but this takes some time to execute
                     subjectSpecificParameters = subjectSpecificParameters
)
summary(regModel, criterion = "BIC")
plot(regModel, what = "drift")
# }