NEWS

Dire 3.0.0

METHODS UPDATE

The Dire mml fitting procedures were sped up, and the memory footprint was reduced to allow for data to be fit simultaneously with more rows and more columns. A significant improvement is moving from the poorly scaling Newton’s method to the expectation maximization (EM) method that scales better for large numbers of covariates.
Additional speed ups for larger datasets were afforded by moving data preparation from base R to data.table, tidyr, and dplyr.
The Quasi-Newton (QN) methods now use the lbfgs package which offers superior convergence checks and removes the need for a few slow Newton steps to assure convergence.
An additional new feature is built-in information reduction, which provides further speed ups. This allows users to fit a model more similar to large-scale data (LSA) such as NAEP, PISA, and TIMSS, by fitting only enough principal components of the covariate matrix to maintain a user-specified proportion of the variance in the design matrix (usually called the X matrix in a regression specification). Users can use the retainedInformation argument in mml to take advantage of this feature.
The code now displays more status outputs using the cli package. This helps users monitor progress of the fit.

API CHANGES

Previously, mml required users to generate a wide file with covariates and a long file with one row per student and item. The new API preserves this option but also allows the user to simply pass one wide file that has items on it. However, the stuItems argument is now deprecated and may be removed at a future date because it is easier to simply pass a wide file.
There is an optimizer argument that allows users to select between EM and QN methods. In most cases, EM is faster, but users can experiment to see if they find QN faster for their case.
The calcCor argument used to allow users to calculate a composite without covariance between the elements. This is not used in any LSA and so is deprecated. When a composite is estimated, the correlations between the subscales are always calculated automatedly under the hood.
Previously, the fast argument allowed the user to use faster C++ code optionally. The faster C++ code is now well tested and always used, so the fast argument is deprecated.
Because the principal component analysis makes the formation of the covariate matrix more complicated, returning the X (design) matrix was substituted with a function getX. Function getX takes student data (stuDat) and returns the design matrix or principal components for that data, mirroring if the principal component analysis was used in the original call. This also reduces the memory footprint for the result. The getX function also manages relevels of covariates.
All summaries now use the gradient approximation to the Hessian. This is because of substantial performance improvements made to the gradient calculation and the intractability of calculating a full Hessian for system with a large number of covariates.
To save memory, a composite fit no longer returns a list of the datasets used to generate each subscale filtered to only students who saw items in that subscale. It now returns the full dataset only once.
For generalized partial credit model (GPCM) items, the polyParamTab argument must now have a d0 column that is always zero. This was previously done silently. This is intended to make the connection between the vignette description of the GPCM, the inputs, and the outputs more clearly linked to each other.
Also, for GPCM items, the number of score points is consistent with general psychometric understanding that an item with three possible scores (incorrect, partially correct, and correct) has two score points.
drawPVs now always uses the more defensible stochasticBeta=TRUE based on simulation results that we will share at NCME 2025.

Dire 2.2.0 (2023-10-26)

Methods update

Functions are a bit clearer about some errors when data sets do not agree. In particular, when a PSU/stratum variable is missing and Taylor variance is selected it gives a plain language error.
turning on multiCore now fits the latent regressions with multiple cores too. Previously it would only fit the covariance matrix with multiple cores.
added a nearly singular model check
allow stuDat to have students that are not on the item data without throwing an error.
optimization tries to avoid Newton's method by using the lbfgs package which allows for a condition on the gradient to be set. Newton's method can be very slow for large datasets.
when Newton's method is used, the output is more verbose.
the C++ implementation of the Hessian has been sped up a bit.

Dire 2.1.1 (2023-01-09)

BUG FIXES

Fixed bug in degrees of freedom replication in composite. This causes summary to fail in many cases.
Fixed version number error in this file. 2.1.0 changes had previously been named 2.0.0.

Dire 2.1.0 (2022-06-29)

New features

Added degrees of freedom and p-values to mml results
mml should be faster now

Dire 2.0.0

New features

Added drawPV functions that draw plausible values from a normal approximation to the posterior distribution. See the drawPVs function help for details.
the object returned by mml now includes an object itemScorePoints that shows, for each item, the expected and actually occupied score points.
If items have invalid score points an error now shows the itemScorePoints table.

Methods update

The mml function used to use the bobyqa optimizer and now uses a combination of the optim function and then a Newton's method optimizer
The mml function Taylor series covariance calculation for composite results has been updated so that the correlation is calculated for all subscales simultaneously. This results in covariance matrix that is always positive definite. The old method can be used by requesting "Partial Taylor".