information-theoretic learning (ITL) MATLAB toolbox

This is a MATLAB toolbox that can perform information-theoretic learning (ITL). Although the toolbox is now at the early stage of development, it provides very understandable, self-documented and pretty fast code.

version 1.0 beta

Example

Download the ITL MATLAB toolbox and run the code demo_useMI.m. The code will first plot the 2D data samples vs their class labels, which would look like the plot below.

The main function to calculate the mutual information is

MI = calculateMIComplete(c,X,w,numEstimate);

where

c: an (mx1) class label vector where m is the number of data instances/samples

X: an (mxn) feature matrix where n is the number of features. Here in this example n = 2.

w = 0.05; % The sigma of Gauisssian

numEstimate = 100; % number of samples when using non-parametric Bayesian

Apparently the feature1 (x1) has better discriminative power than feature2 (x2), thus, we expect the mutual information between class label c and x1 is greater than that with x2, that is, MI(x1;c) > MI(x2;c). The experimental results from this code confirm our hypothesis.

Calculating mutual information...

Calculating MI takes 0.063168 sec for 2 voxels

the MI for features x1:0.757 and x2:0.018

version 2.0 beta

In this version, several major changes have been made:

Lots more ITL measure added: entropy, KL divergence, symmetric KL divergence, Jensen-Shannon divergence, information gain ratio (IGR).
The function name is self-explanatory whether or not the function is used for discrete distribution or with samples/signals.
The code is more readable

How to use:

download the toolbox and unzip
run test_unit1.m for the demo

Remarks in version 2.0 beta

All the functions for discrete distribution is carefully checked and agree with theoretical value--see test_unit1.m.
Although my code works perfectly for discrete distribution, a drawback occurs when using with sample (or continuous distribution). In which case we need a non-parametric pdf estimator (e.g., Parzen window) to estimate pdf from the sample. However, the numerical output from such a situation is not identical to the theoretical value. Indeed, it is the scaled version of the theoretical one. Therefore, the code is still good for feature selection and comparison, as long as you don't need the exact theoretical value from the sample code.
If you really need to get the theoretical value for the sample calculation or continuous distribution, we will need a more accurate estimator. I highly recommend using the MATLAB code from Matlab library of Rudy Moddemeijer, which is pretty accurate (as far as I have tested it).

Other ITL resources and toolboxes

Matlab library of Rudy Moddemeijer [url]: very good estimator for calculation (of MI, entropy, conditional entropy) with sample.
TIM toolbox from Kalle Rutanen: The toolbox comprises tons of information-theoretic-related codes, including Renyi's entropy, Shannon's differential entropy, Tsallis's entropy, KL divergence etc.
A nice summary on information theory from Gal Chechik [pdf].
A very nice self-contain tutorial "Tutorial: Information Theory and Statistics" by Bin Yu in ICMLA'08
Fast Calculation of Pairwise Mutual Information Based on Kernel Estimation: MATLAB implementation by Peng Qiu.