Classify using n-fold cross validation
First we may want to generate the data
% n-fold cross validation on matrix
% data: N x D matrix, each row represent feature vector of an observation
% run: N x 1 matrix containing the run#
% label: N x 1 matrix containing the label for each observation
clear
close all
clc
%% Generate a data set containing run information
% % % dirData = './data';
% % % Nc = 10;
% % % Ns = 100;
% % % h = 15;
% % % r = 3;
% % % [data, label, run] = generateSpiralDataWithLabels(Nc,Ns,h,r);
% % % save(fullfile(dirData,'spiral_Nc10_cv'),'data','label','run');
Or if we have a data set already, we can just load it here. We need to rearrange the data a little. In order to do n-fold cross validation, we need to specific which observation is in which fold. We do that by having a vector "run" keeping track of which observation is in which run.
%%
% Load data
% rearrange the data for n-fold cross validation
% Load the data
dirData = './data';
load(fullfile(dirData,'spiral_Nc10_cv'));
data = data(:,1:2);
% rearranging the data
labelList = unique(label);
NClass = length(labelList);
[Ns D] = size(data);
% Here we will make them into 5 folds
Ncv = 5;
runNew = mod(run,Ncv)+1;
% plot the figure before rearranging
figure;
subplot(1,4,1); imagesc(runNew); title('modulo of the run#'); colorbar;
subplot(1,4,2); imagesc(run); title('original run number'); colorbar;
subplot(1,4,3); imagesc(label); title('label'); colorbar;
subplot(1,4,4); imagesc(data); title('feature'); colorbar;
% sort everything according to the run
[runSorted, permMatrix] = sortrows(runNew);
labelSorted = label(permMatrix);
dataSorted = data(permMatrix,:);
figure;
subplot(1,3,1); imagesc(runSorted); title('sorted run #'); colorbar;
subplot(1,3,2); imagesc(labelSorted); title('sorted label'); colorbar;
subplot(1,3,3); imagesc(dataSorted); title('sorted data'); colorbar;
Next we train the SVM model given some parameters c and gamma, and classify the test data set.
%%
% Prepare/initialize some matrices to store some information
confusionMatrix = zeros(NClass,NClass,Ncv);
order = zeros(NClass,Ncv);
totalAccuracy = zeros(1,Ncv);
predictedLabel = labelSorted*0;
decisValueWinner = labelSorted*0;
% SVM parameters
% the best parameters are obtained from some cross validation process
bestParam = ['-q -c 64 -g 0.015625'];
for ncv = 1:Ncv
% Pick one fold at a time
testIndex = runSorted == ncv;
trainIndex = runSorted ~= ncv;
trainData = dataSorted(trainIndex,:);
trainLabel = labelSorted(trainIndex,:);
testData = dataSorted(testIndex,:);
testLabel = labelSorted(testIndex,:);
NTest = sum(testIndex,1);
% #######################
% Train the SVM in one-vs-rest (OVR) mode
% #######################
model = ovrtrainBot(trainLabel, trainData, bestParam);
% #######################
% Classify samples using OVR model
% #######################
[predict_label, accuracy, decis_values] = ovrpredictBot(testLabel, testData, model);
[decis_value_winner, label_out] = max(decis_values,[],2);
predictedLabel(testIndex) = label_out;
decisValueWinner(testIndex) = decis_value_winner;
% #######################
% Make confusion matrix
% #######################
[confusionMatrix(:,:,ncv),order(:,ncv)] = confusionmat(testLabel,label_out);
totalAccuracy(ncv) = trace(confusionMatrix(:,:,ncv))/NTest;
disp(['Fold ', num2str(ncv),' -- Total accuracy from the SVM: ',num2str(totalAccuracy(ncv)*100),'%']);
% Note: For confusionMatrix
% column: predicted class label
% row: ground-truth class label
% But we need the conventional confusion matrix which has
% column: actual
% row: predicted
% % % % Plot the confusion matrix for each fold
% % % figure; imagesc(confusionMatrix(:,:,ncv)');
% % % xlabel('actual class label');
% % % ylabel('predicted class label');
% % % title(['confusion matrix for fold ',num2str(ncv)]);
end
The result for the n-fold cross validation is
Fold 1 -- Total accuracy from the SVM: 65.5%
Fold 2 -- Total accuracy from the SVM: 67%
Fold 3 -- Total accuracy from the SVM: 64.5%
Fold 4 -- Total accuracy from the SVM: 60%
Fold 5 -- Total accuracy from the SVM: 66%
Total accuracy from 5-fold cross validation is 64.6%
When the classification is done, we want to calculate the accuracy and confusion matrix.
% #######################
% Make confusion matrix for the overall classification
% #######################
[confusionMatrixAll,orderAll] = confusionmat(labelSorted,predictedLabel);
figure; imagesc(confusionMatrixAll');
xlabel('actual class label');
ylabel('predicted class label');
title(['confusion matrix for overall classification']);
% Calculate the overall accuracy from the overall predicted class label
accuracyAll = trace(confusionMatrixAll)/Ns;
disp(['Total accuracy from ',num2str(Ncv),'-fold cross validation is ',num2str(accuracyAll*100),'%']);
% % % % Average the accuracy from each fold accuracy
% % % % This is supposed to give the same thing as the method above
% % % avgAccuracy = mean(totalAccuracy(:),1);
% % % disp(['average accuracy is ',num2str(avgAccuracy*100),'%']);
Finally, the results are plotted.
% Compare the actual and predicted class
figure;
subplot(1,2,1); imagesc(labelSorted); title('actual class');
subplot(1,2,2); imagesc(predictedLabel); title('predicted class');
% ################################
% Plot the clustering results
% ################################
% plot the true label for the test set
patchSize = 20*exp(decisValueWinner);
colorList = generateColorList(NClass);
colorPlot = colorList(labelSorted,:);
figure;
scatter(dataSorted(:,1),dataSorted(:,2),patchSize, colorPlot,'filled'); hold on;
% plot the predicted labels for the test set
patchSize = 10*exp(decisValueWinner);
colorPlot = colorList(predictedLabel,:);
scatter(dataSorted(:,1),dataSorted(:,2),patchSize, colorPlot,'filled');