Projects

All my open source projects are hosted on GitHub. The following will provide a short overview for a subset of these projects.

tablespan

Creating and sharing tables in R can be tedious and time consuming. tablespan tries to make this process a bit easier by providing a “good enough” approach to tables. All you need is a single formula describing the table outline and tablespan will do the rest. Tables can be exported to HTML, Excel, LaTeX, and RTF.

Here is a basic example:

library(tablespan)
library(dplyr)
data("mtcars")

summarized_table <- mtcars |>
  group_by(cyl, vs) |>
  summarise(N = n(),
            mean_hp = mean(hp),
            sd_hp = sd(hp),
            mean_wt = mean(wt),
            sd_wt = sd(wt))

tbl <- tablespan(data = summarized_table,
                 formula = Cylinder:cyl + Engine:vs ~
                   N +
                   (`Horse Power` = Mean:mean_hp + SD:sd_hp) +
                   (`Weight` = Mean:mean_wt + SD:sd_wt),
                 title = "Motor Trend Car Road Tests",
                 subtitle = "A table created with tablespan",
                 footnote = "Data from the infamous mtcars data set.")

as_gt(tbl = tbl)

Motor Trend Car Road Tests
A table created with tablespan
Cylinder	Engine	N	Horse Power		Weight
Cylinder	Engine	N	Mean	SD	Mean	SD
4	0	1	91		2.14
4	1	10	81.8	21.872	2.3	0.598
6	0	3	131.667	37.528	2.755	0.128
6	1	4	115.25	9.179	3.389	0.116
8	0	14	209.214	50.977	3.999	0.759
Data from the infamous mtcars data set.

To learn more about tablespan, go to https://jhorzek.github.io/tablespan.

lessSEM

Much of my research focuses on estimating large Structural Equation Models (SEMs). Combining regularization and SEM was first proposed by Jacobucci et al. (2016). With lessSEM (lessSEM estimates sparse SEM), I created a very flexible approach to regularizing SEMs. Compared to regsem and lslx - two alternatives to estimating regularized SEM, lessSEM provides the following functionality:

	regsem	lslx	lessSEM
Model specification	based on lavaan	similar to lavaan	based on lavaan
Maximum likelihood estimation	Yes	Yes	Yes
Least squares estimation	No	Yes	Dev.
Categorical variables	No	Yes	No
Confidence Intervals	No	Yes	No
Missing Data	FIML	Auxiliary Variables	FIML
Multi-group models	No	Yes	Yes
Stability selection	Yes	No	Dev.
Mixed penalties	No	No	Yes
Equality constraints	Yes	No	Yes
Parameter transformations	diff_lasso	No	Yes
Definition variables	No	No	Yes

library(lessSEM)
library(lavaan)

# Identical to regsem, lessSEM builds on the lavaan
# package for model specification. The first step
# therefore is to implement the model in lavaan.

dataset <- simulateExampleData()

lavaanSyntax <- "
      f =~ l1*y1 + l2*y2 + l3*y3 + l4*y4 + l5*y5 + 
           l6*y6 + l7*y7 + l8*y8 + l9*y9 + l10*y10 + 
           l11*y11 + l12*y12 + l13*y13 + l14*y14 + l15*y15
      f ~~ 1*f
      "

lavaanModel <- lavaan::sem(lavaanSyntax,
                           data = dataset,
                           meanstructure = TRUE,
                           std.lv = TRUE)

# Regularization:

lsem <- lasso(
  # pass the fitted lavaan model
  lavaanModel = lavaanModel,
  # names of the regularized parameters:
  regularized = c("l6", "l7", "l8", "l9", "l10",
                  "l11", "l12", "l13", "l14", "l15"),
  # in case of lasso and adaptive lasso, we can specify the number of lambda
  # values to use. lessSEM will automatically find lambda_max and fit
  # models for nLambda values between 0 and lambda_max. For the other
  # penalty functions, lambdas must be specified explicitly
  nLambdas = 50)

# use the plot-function to plot the regularized parameters:
plot(lsem)

Go to https://jhorzek.github.io/lessSEM for a full introduction.

mxsem

Structural Equation Models (SEMs) are extremely flexible. The R package OpenMx is one of the most versatile frameworks to implement SEMs. However, this versatility comes at a price: The model specification can be challenging for users new to SEM or new to R. mxsem unlocks some of the features that OpenMx provides with a lavaan-like syntax.

Here is an example, where we fit a SEM with definition variables:

library(mxsem)
set.seed(9820)
dataset <- simulate_moderated_nonlinear_factor_analysis(N = 100)
head(dataset, n = 3)

             x1         x2         x3         y1         y2         y3 k
[1,] -1.2166034 -1.2374549 -1.3731943 -1.0101868 -0.8296293 -1.2300555 0
[2,]  1.1911346  0.9971499  1.0226322  0.8604803  0.4509088  0.6052786 1
[3,] -0.7777169 -0.4725291 -0.8507347 -1.0958285 -0.5035753 -0.8048378 0

# Fit model
model <- "
  # loadings
     xi  =~ x1 + x2 + x3
     eta =~ y1 + y2 + y3
  # regression
     eta ~ {a := a0 + a1*data.k} * xi
"

fit_mx <- mxsem(model = model,
                data  = dataset) |>
  mxTryHard()

summary(fit_mx)

Summary of untitled2 
 
free parameters:
      name         matrix row col    Estimate   Std.Error A lbound ubound
1    xi→x2              A  x2  xi  0.79157834 0.026246075                
2    xi→x3              A  x3  xi  0.89166096 0.027991561                
3   eta→y2              A  y2 eta  0.81610407 0.028977400                
4   eta→y3              A  y3 eta  0.90741887 0.027924367                
5    x1↔x1              S  x1  x1  0.04060225 0.011022294       0!       
6    x2↔x2              S  x2  x2  0.04519856 0.008621611       0!       
7    x3↔x3              S  x3  x3  0.04647179 0.010143732 !     0!       
8    y1↔y1              S  y1  y1  0.03388960 0.008495340       0!       
9    y2↔y2              S  y2  y2  0.04210942 0.007766681 !     0!       
10   y3↔y3              S  y3  y3  0.03107014 0.007268282 !     0!       
11   xi↔xi              S  xi  xi  1.07304561 0.157789135    1e-06       
12 eta↔eta              S eta eta  0.26127597 0.041232528    1e-06       
13  one→x1              M   1  x1 -0.14881020 0.105058448                
14  one→x2              M   1  x2 -0.10969665 0.084339842                
15  one→x3              M   1  x3 -0.15448447 0.094427398                
16  one→y1              M   1  y1 -0.05304587 0.089762240                
17  one→y2              M   1  y2 -0.13040811 0.074579740                
18  one→y3              M   1  y3 -0.05666212 0.081648161                
19      a0 new_parameters   1   1  0.78168077 0.069381154                
20      a1 new_parameters   1   2 -0.19334124 0.107741726                

Model Statistics: 
               |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
       Model:             20                      7              475.3822
   Saturated:             27                      0                    NA
Independence:             12                     15                    NA
Number of observations/statistics: 100/27

Information Criteria: 
      |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
AIC:       461.3822               515.3822                 526.0151
BIC:       443.1460               567.4856                 504.3206
To get additional fit indices, see help(mxRefModels)
timestamp: 2025-04-18 15:56:56 
Wall clock time: 0.04190588 secs 
optimizer:  SLSQP 
OpenMx version number: 2.21.13 
Need help?  See help(mxSummary)

A thorough introduction to the package can be found at https://jhorzek.github.io/mxsem

R Lernplattform

Learning R is hard. During my teaching assignments at the Humboldt-Universität zu Berlin, I have contributed to the R Lernplattform that teaches R to psychology students (currently only in German).

lesstimate

lesstimate (lesstimate estimates sparse estimates) is a C++ header-only library that lets you combine statistical models such linear regression with state of the art penalty functions (e.g., lasso, elastic net, scad). With lesstimate you can add regularization and variable selection procedures to your existing modeling framework. It is currently used in lessSEM to regularize structural equation models.

A thorough introduction to the package is provided at https://jhorzek.github.io/lesstimate.