**MoEClust** is an R package which fits finite Gaussian
Mixtures of Experts models using a range of parsimonious covariance
parameterisations via the EM/CEM algorithm, i.e. allows incorporation of
covariates into the mixing proportions and/or Gaussian densities of
finite Gaussian mixture models under the various parsimonious covariance
parameterisations in the GPCM family (e.g. **mclust**).
These models were introduced by Murphy and Murphy
(2020). The package also facilitates the inclusion of an additional
noise component, and allows visualisation of Gaussian mixture of experts
models with parsimonious covariance parameterisations using generalised
pairs plots.

The most important function in the **MoEClust** package
is: `MoE_clust`

, for fitting the model via the EM/CEM
algorithm with gating and/or expert network covariates, supplied via
formula interfaces. `MoE_compare`

is provided for conducting
model selection between different results from `MoE_clust`

using different covariate combinations &/or initialisation
strategies, etc.

`MoE_stepwise`

is provided for conducting a greedy forward
stepwise search to identify the optimal model in terms of the number of
components, GPCM covariance type, and the subsets of gating/expert
network covariates.

`MoE_control`

allows supplying additional arguments to
`MoE_clust`

and `MoE_stepwise`

which govern, among
other things, controls on the inclusion of an additional noise component
and controls on the initialisation of the allocations for the EM/CEM
algorithm.

A dedicated plotting function exists for visualising the results
using generalised pairs plots, for examining the gating network, and/or
log-likelihood, and/or clustering uncertainties, and/or similarity
matrix, and/or graphing model selection criteria values. The generalised
pairs plots (`MoE_gpairs`

) visualise all pairwise
relationships between clustered response variables and associated
continuous, categorical, and/or ordinal covariates in the gating
&/or expert networks, coloured according to the MAP classification,
and also give the marginal distributions of each variable (incl. the
covariates) along the diagonal.

An `as.Mclust`

method is provided to coerce the output of
class `"MoEClust"`

from `MoE_clust`

to the
`"Mclust"`

class, to facilitate use of plotting and other
functions for the `"Mclust"`

class within the
**mclust** package. As per **mclust**,
**MoEClust** also facilitates modelling with an additional
noise component (with or without the mixing proportion for the noise
component depending on covariates).

Finally, a `predict`

method is provided for predicting the
fitted response and probability of cluster membership (and by extension
the MAP classification) for new data, in the form of new covariates and
new response data, or new covariates only.

Other functions also exist, e.g. `MoE_crit`

,
`MoE_dens`

, `MoE_estep`

, and `aitken`

,
which are all used within `MoE_clust`

but are nonetheless
made available for standalone use. The package also contains two data
sets: `CO2data`

and `ais`

.

This vignette aims to demonstrate the **MoEClust**
models via application to these two well-known univariate and
multivariate data sets, respectively.

**MoEClust** will run in Windows, Mac OS X, or Linux. To
install it you first need to install R. Installing RStudio as a nice desktop environment for
using R is also recommended.

Once in R you can type at the R command prompt:

to install the latest development version of the package from the
**MoEClust** GitHub page.

To instead install the latest stable official release of the package from CRAN go to R and type:

In either case, if you then type:

it will load in all the **MoEClust** functions.

The GitHub version contains a few more features but some of these may
not yet be fully tested, and occasionally this version might be liable
to break when it is in the process of being updated. If you find bugs or
want to suggest new features please visit the **MoEClust**
GitHub issues
page.

Load the CO2 data.

Fit various MoEClust mixture models to cluster the CO2 data, allowing
the GNP variable to enter the gating &/or expert networks, or
neither, via a formula interface. Also consider models with equal mixing
proportions. Note that for models with covariates in the gating network,
or models with equal mixing proportions, we don’t need to fit
single-component models (though it could be done!) as this would merely
duplicate the single-component models within `m1`

and
`m3`

, respectively.

```
m1 <- MoE_clust(CO2, G=1:3)
m2 <- MoE_clust(CO2, G=2:3, gating= ~ GNP)
m3 <- MoE_clust(CO2, G=1:3, expert= ~ GNP)
m4 <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP)
m5 <- MoE_clust(CO2, G=2:3, equalPro=TRUE)
m6 <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE)
```

Choose the best model among these. Specify
`optimal.only=TRUE`

so that only the optimal model in each
set of models is included in the comparison.

Now, see if a better model can be found using greedy forward stepwise
selection, with the aid of the `MoE_stepwise`

function, on
the same data. By default, this starts from a single-component model and
successively adds components and covariates to both networks, while
evaluating each action over all possible model types, until a model
which is optimal according to some criterion is found.

```
(mod1 <- MoE_stepwise(CO2, GNP))
#> ---------------------------------------------------------------------
#> Comparison of Gaussian Parsimonious Clustering Models with Covariates
#> Data: CO2
#> Ranking Criterion: BIC
#> Optimal Only: TRUE
#> ---------------------------------------------------------------------
#>
#> rank MoENames modelNames G df iters bic icl aic loglik gating
#> 1 Step_4 E 3 7 21 -155.2 -159.062 -145.875 -65.937 None
#> 2 Step_3 V 2 7 7 -157.205 -160.039 -147.88 -66.94 None
#> 3 Step_2 E 2 4 19 -163.164 -163.911 -157.835 -74.917 None
#> 4 Step_1 E 1 2 1 -163.905 -163.905 -161.24 -78.62 None
#> expert algo equalPro
#> ~GNP EM TRUE
#> ~GNP EM FALSE
#> None EM FALSE
#> None EM
```

Next, conduct another stepwise search considering models with a noise
component. This notably starts from a model with only a uniform noise
component and then proceeds as above to add Gaussian components and
covariates one at a time, accepting each action that increases a chosen
criterion (`"bic"`

, by default).

```
(mod2 <- MoE_stepwise(CO2, GNP, noise=TRUE))
#> ---------------------------------------------------------------------
#> Comparison of Gaussian Parsimonious Clustering Models with Covariates
#> Data: CO2
#> Ranking Criterion: BIC
#> Optimal Only: TRUE
#> ---------------------------------------------------------------------
#>
#> rank MoENames modelNames G df iters bic icl aic loglik gating
#> 1 Step_2 E 1 4 22 -160.781 -173.158 -155.453 -73.726 None
#> 2 Step_1 0 1 1 -165.503 -165.503 -164.171 -81.086 None
#> expert algo noise
#> None EM hypvol
#> None EM hypvol
```

Finally, compare all sets of results to choose the optimal model.

```
(best <- MoE_compare(mod1, mod2, comp, pick=1)$optimal)
#> Call: MoE_stepwise(data = CO2, network.data = GNP)
#>
#> Best Model (according to BIC):
#> univariate, equal variance (E), with 3 components
#>
#> Equal Mixing Proportions
#> BIC = -155.2 | ICL = -159.062 | AIC = -145.875
#> Including expert network covariates:
#> Expert: ~GNP
```

```
(summ <- summary(best, classification=TRUE, parameters=FALSE, networks=FALSE))
#> ------------------------------------------------------
#> Gaussian Parsimonious Clustering Model with Covariates
#> Data: CO2
#> ------------------------------------------------------
#>
#> MoEClust: E (univariate, equal variance), with 3 components
#>
#> Gating Network Covariates: None
#> Expert Network Covariates: ~GNP
#> Equal Mixing Proportions: TRUE
#> Noise Component: FALSE
#>
#> log.likelihood n d df iters BIC ICL AIC Algo
#> -65.937 28 1 7 21 -155.2 -159.062 -145.875 EM
#>
#> Clustering table :
#> 1 2 3
#> 8 10 10
```

Visualise the results for the optimal model using a generalised pairs plot.

Visualise the density of the mixture distribution.

Convert from the `"MoEClust"`

class to the
`"Mclust"`

class in order to further visualise the results.
Examine the `"classification"`

and `"uncertainty"`

options.

Predictions can also be made from `MoEClust`

models: the
response, probability of cluster membership, and the MAP classification
can be predicted for the fitted data or for new data (in the form of new
covariates and new response variables, or new covariates only). Let’s
predict the response variable using the optimal model fit above to the
CO2 data.

```
#> [1] 14.258797 3.901356 20.461833 9.057538 8.292203 14.981863 6.849704
#> [8] 6.679846 9.695510 10.632816 9.451615 9.831188 6.255028 9.590701
#> [15] 7.237534 5.289565 9.782900 6.588562 9.531164 17.968480 8.514569
#> [22] 6.936316 6.725192 6.275709 5.546887 3.319349 9.910969 10.736908
```

Now let’s build a model on some of the CO2 data and retain the indices of the withheld observations:

```
ind <- sample(1:nrow(CO2data), 2)
res <- MoE_clust(CO2data[-ind,]$CO2, G=3, expert= ~ GNP,
equalPro=TRUE, network.data=CO2data[-ind,])
```

Now we can make predictions on the withheld data, either by using the
withheld covariates only, or by also using the withheld response
variables. Note that `newdata`

can be either a list with
component(s) `new.x`

(and optionally `new.y`

) or a
single matrix/data.frame with the appropriate columns.

```
# Using new covariates only
predict(res, newdata = CO2data[ind,], use.y = FALSE)[1:3]
# Using both new covariates & new response data
predict(res, newdata = CO2data[ind,])[1:3]
```

```
#> y :
#> CO2
#> 1 7.054007
#> 2 10.383095
#>
#> classification :
#> 1 2
#> 1 1
#>
#> z :
#> Cluster1 Cluster2 Cluster3
#> 1 0.3333333 0.3333333 0.3333333
#> 2 0.3333333 0.3333333 0.3333333
#> y :
#> CO2
#> 1 8.602999
#> 2 10.202224
#>
#> classification :
#> 1 2
#> 2 3
#>
#> z :
#> Cluster1 Cluster2 Cluster3
#> 1 2.882222e-07 0.549832047 0.4501677
#> 2 2.787651e-08 0.005682698 0.9943173
```

Load the Australian Institute of Sports data.

Examine the various additional options around initialisation of the algorithm:

Fit a parsimonious Gaussian mixture of experts MoEClust model to the
hematological variables within the AIS data, supplying `sex`

in the expert network and `BMI`

in the gating network via
formula interfaces. Include an additional noise component by specifying
it’s prior mixing proportion `tau0`

. Toggle between allowing
the mixing proportion for the noise component depend on the gating
concomitant or not via the `noise.gate`

argument. This time,
allow the printing of messages to the screen.

```
mod <- MoE_clust(hema, G=1:3, expert= ~ sex, gating= ~ BMI,
network.data=ais, tau0=0.1, noise.gate=FALSE)
```

Visualise the results for the optimal model using a generalised pairs plot.

Replace the scatterplots in response vs. response panels with bivariate density contours. Note that this is liable to be slow for models with expert network covariates.

Visualise the clustering uncertainty for the optimal model using a generalised pairs plot.

Instead visualise the clustering uncertainty in the form of an
ordered profile plot (`type="barplot"`

can also be specified
here).

Plot the BIC of the visited models.