Estimate parameters for profiles for a specific solution

estimate_profiles(df, ..., n_profiles, model = 1, to_return = "tibble",
  center_raw_data = FALSE, scale_raw_data = FALSE,
  return_posterior_probs = TRUE, return_orig_df = FALSE,
  prior_control = FALSE, print_which_stats = "some")

Arguments

df

data.frame with two or more columns with continuous variables

...

unquoted variable names separated by commas

n_profiles

the number of profiles (or mixture components) to be estimated

model

the mclust model to explore: 1 (varying means, equal variances, and residual covariances fixed to 0); 2 (varying means, equal variances and covariances; 3 (varying means and variances, covariances fixed to 0), 4 (varying means and covariances, equal variances; can only be specified in Mplus); 5 (varying means, equal variances, varying covariances); and 6 (varying means, variances, and covariances), in order least to most freely-estimated; see the introductory vignette for more information

to_return

character string for either "tibble" (or "data.frame") or "mclust" if "tibble" is selected, then data with a column for profiles is returned; if "mclust" is selected, then output of class mclust is returned

center_raw_data

logical for whether to center (M = 1) the raw data (before clustering); defaults to FALSE

scale_raw_data

logical for whether to scale (SD = 1) the raw data (before clustering); defaults to FALSE

return_posterior_probs

TRUE or FALSE (only applicable if to_return == "tibble"); whether to include posterior probabilities in addition to the posterior profile classification; defaults to TRUE

return_orig_df

TRUE or FALSE (if TRUE, then the entire data.frame is returned; if FALSE, then only the variables used in the model are returned)

prior_control

whether to include a regularizing prior; defaults to false

print_which_stats

if set to "some", prints (as a message) the log-likelihood, BIC, and entropy; if set to "all", prints (as a message) all information criteria and other statistics about the model; if set to any other values, then nothing is printed

Value

either a tibble or a ggplot2 plot of the BIC values for the explored models

Details

Creates profiles (or estimates of the mixture components) for a specific mclust model in terms of the specific number of mixture components and the structure of the residual covariance matrix

Examples

estimate_profiles(iris, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, model = 1, n_profiles = 3)
#> Fit varying means, equal variances, covariances fixed to 0 (Model 1) model with 3 profiles.
#> LogLik is 361.429
#> BIC is 813.05
#> Entropy is 0.979
#> # A tibble: 150 x 6 #> Sepal.Length Sepal.Width Petal.Length Petal.Width profile posterior_prob #> <dbl> <dbl> <dbl> <dbl> <fct> <dbl> #> 1 5.10 3.50 1.40 0.200 1 1. #> 2 4.90 3.00 1.40 0.200 1 1. #> 3 4.70 3.20 1.30 0.200 1 1. #> 4 4.60 3.10 1.50 0.200 1 1. #> 5 5.00 3.60 1.40 0.200 1 1. #> 6 5.40 3.90 1.70 0.400 1 1. #> 7 4.60 3.40 1.40 0.300 1 1. #> 8 5.00 3.40 1.50 0.200 1 1. #> 9 4.40 2.90 1.40 0.200 1 1. #> 10 4.90 3.10 1.50 0.100 1 1. #> # ... with 140 more rows