Hierarchical biclustering toolbox

Since CoClustering initialization is based on k-mean, the topology of the cluster is therefore not very stable, depending heavily on the results from k-mean. MATLAB has its function clustergram for biclustering algorithm, but does not provide a convenient way to extract the cluster index number of each row/column and the permutation vector. So, I decided to reinvent the biclustering clustering function from hierarchical clustering algorithm (i.e., linkage, dendrogram) provided by MATLAB, hence called hierarchical biclustering algorithm.

The hierarchical biclustering algorithm is stable because the process to determine the hierarchical tree is deterministic. The tree can be reduced to the desired number of clusters by deterministic method too. Therefore, hierarchical biclustering is a very convenient and stable way to cluster the data.

How to use the toolbox:

  1. download the toolbox and unzip
  2. You can look at the demo files in the toolbox. The key function is Biclustering(..).
  3. That's it. Now you can use it

Remark:

  1. Most of the codes in the toolbox are pretty well self-documented and very followable. So, you might want to read the comments there.
  2. Since the toolbox is developed based on hierarchical clustering algorithm, the toolbox contains the demo for such algorithm too.

Hierarchical biclustering algorithm function:

Here is the details of the function Biclustering(..), which is built from the functions linkage and dendrogram originally provided by MATLAB.

function [row_clust_idx, col_clust_idx, y_index, x_index]=Biclustering(A, k_row, k_col) % ===== INPUT ===== % A: MxN input data matrix, where M is the number of examples/voxels; N is % the dimensionality of the feature % k_row: the number of cluster in row % k_col: the number of cluster in column % ===== OUTPUT ===== % row_clust_idx: the cluster label given to each row of matrix A, thus, % the order is with respect to the original matrix A. % col_clust_idx: the cluster label given to each col of matrix A, thus, % the order is with respect to the original matrix A. % y_index: The row permutation matrix to convert the original space to the new % biclustering space. That is, A_row_rearranged = A(y_index,:). % x_index: The column permutation matrix to convert the original space to the new % biclustering space. That is, A_col_rearranged = A(:,x_index).

We first use linkage to make a agglomerative clustering tree on both row and column. Then we use dendrogram to group the examples into the k_row and k_col groups according to the user.

Example:

[The example is from biclustering_from_heirarchical3.m in the toolbox.]

1) Load the data and prepare necessary information. The data is plot in the top-left of the figure (at the end of this section.)

x = [0 0 1 0 0 1 0 1; 0 0 0 0 0 0 0 0; 0 1 0 1 0 0 1 0; 0 0 0 0 0 0 0 0; 0 0 1 0 0 0.5 0 0.6; 0 1 0 1 0 0 1 0]; A = x+0.01*randn(size(x)); figure(1); set(gcf, 'Position', get(0,'Screensize')); % Maximize figure. subplot(2,2,1); imagesc(A); title('original input matrix'); xlabel('original order index'); ylabel('original order index'); %% [numRow,numCol] = size(A); row_name = 1:numRow; row_name = num2cell(row_name); col_name = 1:numCol; col_name = num2cell(col_name);

2) Determine the number of clusters in both row and columns, then use function Biclustering.

%% Hierarchical clustering along the column k_row = 3; k_col = 3; [row_clust_idx, col_clust_idx, y_index, x_index]=Biclustering(A, k_row, k_col);

3) Plot the cluster labels on the original input data matrix A. row_clust_idx and col_clust_idx are the cluster labels for row and column, and they preserve the original order of the input data matrix. The figure is shown at the top-right.

%% Plot the result figure(1); % Plot the end-result cluster label for each row and column subplot(2,2,2); imagesc(A); set(gca,'YTick',1:numRow); set(gca,'YTickLabel',row_clust_idx); set(gca,'XTick',1:numCol); set(gca,'XTickLabel',col_clust_idx); title('original input matrix'); xlabel('cluster index with original order'); ylabel('cluster index with original order');

4) We might want to rearrange the input data matrix so that we can see the cluster forming. In which case we use x_index and y_index, the permutation vector for column and row of the input data matrix A. The order of the labels is determined from the dendrogram function in MATLAB.

% Rearrange the cluster according to the dendrogram A_x = A(:,x_index); A_xy = A_x(y_index,:); subplot(2,2,3); imagesc(A_xy); set(gca,'YTick',1:numRow); set(gca,'YTickLabel',row_name(y_index)); set(gca,'XTick',1:numCol); set(gca,'XTickLabel',col_name(x_index)); title('cluster rearranged by the denrogram'); xlabel('original index'); ylabel('original index'); subplot(2,2,4); imagesc(A_xy); set(gca,'YTick',1:numRow); set(gca,'YTickLabel',row_clust_idx(y_index)); set(gca,'XTick',1:numCol); set(gca,'XTickLabel',col_clust_idx(x_index)); title('cluster rearranged by the denrogram'); xlabel('cluster index'); ylabel('cluster index');

The plot results might look like the following:

(click at the figure for better resolution.)

and its corresponding dendrogram is shown below. The order of the row cluster is 1, 2, 3; whereas the column order is 1, 3, 2. The order is shown on the bottom-right image of the figure above.

Now, if we keep k_col=3 and change k_row = 4, we will get the figure below:

and its corresponding dendrogram: