shared-component Gaussian Mixture Models

In some applications we want to model the multiple datasets simultaneously using the same Gaussian mixture model (GMM) components (mu_k and Sigma_k) across the multiple datasets. That is, all the datasets share the same mu_k and Sigma_k, but each having different set of pi_k--pi_ks for each dataset s.

The derivation note can be found here-- sorry in advance for all the mess in the note.

The MATLAB code is made available here.

The toy dataset

Datasets: The experiment contains 3 datasets, each of which has 1000 data examples, thus 3000 examples in total. Each dataset has the following components:

The dataset can be summarized in figure below.

Experiment1: shared-component GMM with K = 5

Here are some preliminary results on the shared-component GMM:

first column: the data examples in each dataset

second column: the resulting GMM in each dataset

third column: the true mu_k and pi_k for each dataset (blue) vs ones obtained from shared component GMM (red).

The resulting centers seem to stay close to the true ones and can be seen as the average version of the neighbor centers.

Experiment2: shared-component GMM with K = 3, 6, 7, 8 and 10

Here we want to see how the algorithm behave when the number of clusters exceeds the true ones.

K

3

6

7

8

10

clustering result

log-likelihood

Now, we show all the results together (click for larger figures).