experiment on SA

We show the experiment using SA on voxel selection. There are 3 classification algorithms: SVM, LR and GNB. The strategies used to direct the SA are obtained from the combination of the following criteria:

the regularization constant c
voxel order:
- 1) randomly shuffle the voxels: the order of voxels feeding to the SA is randomly shuffled.
- 2) MI-descending order: the order of voxels feeding to the SA is in MI-descending order
voxels introduced to the SA:
- 1) 50-50: a voxel has a 50% chance to be included in the new solution to be proposed to the SA. It's true that we might waste some cycle in the first round if we exclude a voxel that has not included in the solution.
- 2) toggle: if a voxel is included in the current solution, it will be excluded from the new solution to be proposed to the SA.

Therefore, there will be 8 combinations in total for a choice of classifier. The SA parameters are set as follows:

% @#$% SA parameters

T = 1;

T_stop = 0.005; % the stopping temperature

alpha = 0.9;

itt_max = 5000;

Rep_T_max = 200; % max number of Rep in one temperature T

Rep_accept_max =50; % if solution accepted this many, then update T

kc = 10; % k-constant. The smaller kc --> less solution accepted @#$% user-defined

The code explanation

Mapping function for solution

NextSolutionX.m is the mapping function of the solution, that is,

x_new <-- NextSolutionX(x_current)

Objective function

ObjectiveFunctionX.m is the objective function that takes a solution x as an input, that is, an objective

value/score <-- ObjectiveFunctionX(x_current).

In this particular application, the objective function consists of 2 ingredients:

The classification accuracy
The cost of the solution depending on the number of voxels used in the solution
the user-defined regularization term c indicating at what extent we want to involve the cost. If we don't want the cost to affect the objective function, c is then set to 0.

Experiment on the ventral temporal (VT) cortex

* (averaged over 10 runs of 10-fold)

** time per fold

My plan is to pick only a few criteria, perhaps {0, randomly shuffle, 50-50} and {0, MI-descend, 50-50}, to use with GNB and LR--just to save some times.

Another strategy to try is the "aggregate and prune", that is, in the first few rounds, we will just keep collecting a new voxel even if it does not increase the accuracy, and in the end rounds, we will start pruning the voxels that does not degrade the performance.

And once things are pretty clear, we can plan to work on the whole brain accordingly.

Experiment on the whole brain

The table shows the experimental results using the whole brain as ROI.

* (averaged over 10 runs of 10-fold)

** time per fold

*** use SA with 5000 iterations instead, and use the pre-calculated MI in order to save the time. Ideally, MI needs to be calculated for every fold on the training data set only, however, there are >43000 features + 100 experiments to run, which it would take too long to run. So, I will have to calculate MI for all the data (both train and test) once and for all to save the time. In fact, the MI calculated from the training set (9 folds) alone and the whole, (9-fold) training + (1-fold) test set, would be pretty much the same, so my approximation would not change the order of the feature that much as there is only a fold for testing.

Things need to be improved

For the whole brain experiment 43000 voxels, using only 5000 iterations might not be appropriate. Instead, I should cover multiples of 43000 voxels within that 5000 iterations so that all the voxels will be revisited several times yielding better solution.
GNB in MATLAB sometimes gives error. I probably have to code GNB by myself.