This document compares mclust and Mplus output, for the purpose of benchmarking the use of mclust, an open-source tool, to Mplus.`estimate_profiles()`

uses the mclust R package, while `estimate_profiles_mplus()`

generates Mplus model syntax which is then run with Mplus. Note that for the `estimate_profiles_mplus()`

function to work, you must have Mplus installed (and purchased).

```
library(tidyLPA)
#> tidyLPA provides the functionality to carry out Latent Profile Analysis. Note that tidyLPA is still at the beta stage!
#> Please report any bugs at https://github.com/jrosen48/tidyLPA or send an email to jrosen@msu.edu.
```

The functions that use MPlus rely on the MplusAutomation R package (and MPlus) to run models from R. For example, let’s run the following function:

```
m1_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 1)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 488.915
#> BIC is 1042.968
#> Entropy is 0.991
```

Running this generates four files necessary for running an LPA model through MPlus: 1. a data file (`.dat`

) 1. an input / syntax file (`.inp`

) 1. (from running the model through MPlus) an output file (`.out`

) 1. (also from running the model through MPlus) a modified data file, with the class probabilities and class (profile) assignments (also `.dat`

, but with a different file name than the data file used in the input)

Here, for instance, is the input file dynamically generated by tidyLPA by the function above:

```
TITLE: test
DATA: File is d.dat;
VARIABLE:
Names are Sepal_Length Sepal_Width Petal_Length Petal_Width;
Classes = c(2);
MODEL:
%overall%
[Sepal_Length Sepal_Width Petal_Length Petal_Width];
Sepal_Length Sepal_Width Petal_Length Petal_Width;
Sepal_Length WITH Sepal_Width@0;
Sepal_Length WITH Petal_Length@0;
Sepal_Length WITH Petal_Width@0;
Sepal_Width WITH Petal_Length@0;
Sepal_Width WITH Petal_Width@0;
Petal_Length WITH Petal_Width@0;
%c#1%
[Sepal_Length Sepal_Width Petal_Length Petal_Width];
Sepal_Length Sepal_Width Petal_Length Petal_Width(1-4);
Sepal_Length WITH Sepal_Width@0;
Sepal_Length WITH Petal_Length@0;
Sepal_Length WITH Petal_Width@0;
Sepal_Width WITH Petal_Length@0;
Sepal_Width WITH Petal_Width@0;
Petal_Length WITH Petal_Width@0;
%c#2%
[Sepal_Length Sepal_Width Petal_Length Petal_Width];
Sepal_Length Sepal_Width Petal_Length Petal_Width(1-4);
Sepal_Length WITH Sepal_Width@0;
Sepal_Length WITH Petal_Length@0;
Sepal_Length WITH Petal_Width@0;
Sepal_Width WITH Petal_Length@0;
Sepal_Width WITH Petal_Width@0;
Petal_Length WITH Petal_Width@0;
ANALYSIS:
Type is mixture;
start = 20 4;
miterations = 500;
stiterations = 10;
convergence = 1e-06;
processors = 1;
OUTPUT: TECH1 TECH11;
SAVEDATA: File is d-mod.dat;
SAVE = CPROBABILITIES;
```

Along with the data file that is also automatically generated, this input file is the sent to MPlus (you don’t have to open MPlus or do anything - the MplusAutomation package makes this both possible and easy), and then the output file is returned.

We can see how this input file dynamically changes if we change the function’s arguments by removing a variable, increasing the number of profiles, and changing the model that is specified:

```
m1_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length,
n_profiles = 3,
model = 2, remove_tmp_files=F)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 308.07
#> BIC is 701.32
#> Entropy is 0.917
```

```
TITLE: test
DATA: File is d.dat;
VARIABLE:
Names are Sepal_Length Sepal_Width Petal_Length;
Classes = c(3);
MODEL:
%overall%
[Sepal_Length Sepal_Width Petal_Length];
Sepal_Length Sepal_Width Petal_Length;
Sepal_Length WITH Sepal_Width;
Sepal_Length WITH Petal_Length;
Sepal_Width WITH Petal_Length;
%c#1%
[Sepal_Length Sepal_Width Petal_Length];
Sepal_Length Sepal_Width Petal_Length(1-3);
Sepal_Length WITH Sepal_Width(4);
Sepal_Length WITH Petal_Length(5);
Sepal_Width WITH Petal_Length(6);
%c#2%
[Sepal_Length Sepal_Width Petal_Length];
Sepal_Length Sepal_Width Petal_Length(1-3);
Sepal_Length WITH Sepal_Width(4);
Sepal_Length WITH Petal_Length(5);
Sepal_Width WITH Petal_Length(6);
%c#3%
[Sepal_Length Sepal_Width Petal_Length];
Sepal_Length Sepal_Width Petal_Length(1-3);
Sepal_Length WITH Sepal_Width(4);
Sepal_Length WITH Petal_Length(5);
Sepal_Width WITH Petal_Length(6);
ANALYSIS:
Type is mixture;
start = 20 4;
miterations = 500;
stiterations = 10;
convergence = 1e-06;
processors = 1;
OUTPUT: TECH1 TECH11;
SAVEDATA: File is d-mod.dat;
SAVE = CPROBABILITIES;
```

Before proceeding, I will note that the functions that use MPlus are still under-development; dynamically generating the input files is technically demanding, and it is possible that these functions will change significantly in later versions. Nevertheless, they work well with testing using both built-in data (like the iris and geyser datasets used in this document) as well as a range of other data.

In the rest of this document, Mplus and mclust are benchmarked to one another. For ease of comparison, the estimated log-likelihood; if these are identical it probably means all of the estimated paramteters are identical (other benchmarks we have done with Mplus and mclust have demonstrated this).

**Mplus**

```
m1_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 1)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 488.915
#> BIC is 1042.968
#> Entropy is 0.991
m2_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 2)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 296.448
#> BIC is 688.097
#> Entropy is 1
m3_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 3)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 386.185
#> BIC is 857.551
#> Entropy is 1
m5_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 6)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 214.355
#> BIC is 574.018
#> Entropy is 1
```

**mclust**

Note that model 4 can only be specified in Mplus, so is not included here.

```
m1_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 1)
#> Fit varying means, equal variances, covariances fixed to 0 (Model 1) model with 2 profiles.
#> LogLik is 488.915
#> BIC is 1042.968
#> Entropy is 0.998
m2_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 2)
#> Fit varying means, equal variances and covariances (Model 2) model with 2 profiles.
#> LogLik is 296.448
#> BIC is 688.097
#> Entropy is 1
m3_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 3)
#> Fit varying means and variances, covariances fixed to 0 (Model 3) model with 2 profiles.
#> LogLik is 386.185
#> BIC is 857.551
#> Entropy is 1
m5_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 6)
#> Fit varying means, variances, and covariances (Model 4) model with 2 profiles.
#> LogLik is 214.355
#> BIC is 574.018
#> Entropy is 1
```

Here is a summary of the log-likelihood statistics:

```
library(tibble)
iris_log_lik <- tribble(
~model, ~mplus, ~mclust,
1, 488.915, 488.915,
2, 296.448, 296.448,
3, 386.185, 386.185,
6, 214.355, 214.355
)
# iris_log_lik$model <- as.integer(iris_log_lik$model)
```

```
#> model mplus mclust
#> 1 1 488.915 488.915
#> 2 2 296.448 296.448
#> 3 3 386.185 386.185
#> 4 6 214.355 214.355
```

What if we specify 3 profiles?

**Mplus**

```
m1_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 1)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 361.426
#> BIC is 813.042
#> Entropy is 0.957
m2_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 2)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 256.354
#> BIC is 632.963
#> Entropy is 0.962
m3_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 3)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 307.178
#> BIC is 744.632
#> Entropy is 0.948
m5_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 6)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 180.185
#> BIC is 580.839
#> Entropy is 0.97
```

**mclust**

Note that model 4 can only be specified in Mplus, so is not included here.

```
m1_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 1)
#> Fit varying means, equal variances, covariances fixed to 0 (Model 1) model with 3 profiles.
#> LogLik is 361.429
#> BIC is 813.05
#> Entropy is 0.979
m2_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 2)
#> Fit varying means, equal variances and covariances (Model 2) model with 3 profiles.
#> LogLik is 256.355
#> BIC is 632.965
#> Entropy is 0.985
m3_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 3)
#> Fit varying means and variances, covariances fixed to 0 (Model 3) model with 3 profiles.
#> LogLik is 307.181
#> BIC is 744.638
#> Entropy is 0.979
m5_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 6)
#> Fit varying means, variances, and covariances (Model 4) model with 3 profiles.
#> LogLik is 180.186
#> BIC is 580.84
#> Entropy is 0.99
```

```
#> model mplus mclust
#> 1 1 361.426 361.429
#> 2 2 256.354 256.355
#> 3 3 307.178 307.181
#> 4 6 180.185 180.186
```

**Mplus**

```
m1_mplus <- estimate_profiles_mplus(faithful,
eruptions, waiting,
n_profiles = 2,
model = 1)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 1157.68
#> BIC is 2354.601
#> Entropy is 0.993
m2_mplus <- estimate_profiles_mplus(faithful,
eruptions, waiting,
n_profiles = 2,
model = 2)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 1140.187
#> BIC is 2325.22
#> Entropy is 0.993
m3_mplus <- estimate_profiles_mplus(faithful,
eruptions, waiting,
n_profiles = 2,
model = 3)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 1147.806
#> BIC is 2346.065
#> Entropy is 0.999
m5_mplus <- estimate_profiles_mplus(faithful,
eruptions, waiting,
n_profiles = 2,
model = 6)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 1130.264
#> BIC is 2322.192
#> Entropy is 0.996
```

**mclust**

Note that model 4 can only be specified in Mplus, so is not included here.

```
m1_mclust <- estimate_profiles(faithful,
eruptions, waiting,
n_profiles = 2,
model = 1)
#> Fit varying means, equal variances, covariances fixed to 0 (Model 1) model with 2 profiles.
#> LogLik is 1157.68
#> BIC is 2354.601
#> Entropy is 0.998
m2_mclust <- estimate_profiles(faithful,
eruptions, waiting,
n_profiles = 2,
model = 2)
#> Fit varying means, equal variances and covariances (Model 2) model with 2 profiles.
#> LogLik is 1140.187
#> BIC is 2325.22
#> Entropy is 0.998
m3_mclust <- estimate_profiles(faithful,
eruptions, waiting,
n_profiles = 2,
model = 3)
#> Fit varying means and variances, covariances fixed to 0 (Model 3) model with 2 profiles.
#> LogLik is 1147.806
#> BIC is 2346.065
#> Entropy is 1
m5_mclust <- estimate_profiles(faithful,
eruptions, waiting,
n_profiles = 2,
model = 6)
#> Fit varying means, variances, and covariances (Model 4) model with 2 profiles.
#> LogLik is 1130.264
#> BIC is 2322.192
#> Entropy is 0.999
```

Here is a summary of the log-likelihood statistics:

```
#> model mplus mclust
#> 1 1 1157.680 1157.680
#> 2 2 1140.187 1140.187
#> 3 3 1147.806 1147.806
#> 4 6 1130.264 1130.264
```

Finding conditions when the models do differ, especially for large or complex datasets, is an important consideration. These differences, when encountered, are possibly due to two reasons; investigating these will be an important next step:

- Different initialization routines: Mplus uses random starts, and while mclust can use random starts, it defaults to using hierarchical clustering to initialize.
- Different estimation methods: Both use Expectation-Maximization (EM) maximum likelihood estimation methods, though Mplus may use an estimator which is more robust to non-normal distributions.