shared-component Gaussian Mixture Models
In some applications we want to model the multiple datasets simultaneously using the same Gaussian mixture model (GMM) components (mu_k and Sigma_k) across the multiple datasets. That is, all the datasets share the same mu_k and Sigma_k, but each having different set of pi_k--pi_ks for each dataset s.
The derivation note can be found here-- sorry in advance for all the mess in the note.
The MATLAB code is made available here.
The toy dataset
Datasets: The experiment contains 3 datasets, each of which has 1000 data examples, thus 3000 examples in total. Each dataset has the following components:
The dataset can be summarized in figure below.
Experiment1: shared-component GMM with K = 5
Here are some preliminary results on the shared-component GMM:
first column: the data examples in each dataset
second column: the resulting GMM in each dataset
third column: the true mu_k and pi_k for each dataset (blue) vs ones obtained from shared component GMM (red).
The resulting centers seem to stay close to the true ones and can be seen as the average version of the neighbor centers.
Experiment2: shared-component GMM with K = 3, 6, 7, 8 and 10
Here we want to see how the algorithm behave when the number of clusters exceeds the true ones.
K
3
6
7
8
10
clustering result
log-likelihood
Now, we show all the results together (click for larger figures).