sparse regularization on lexical data

Here are some results from using L1- and L2-norm with logistic regression on lexical data

code

demo3.m

function

The can apply ridge, lasso and elastic-net regularization with LR. The optimal parameters lambda and alpha are optimally selected for the whole dataset one one single time. That is, the parameters are shared across all runs.

Data matrices produced

The datamatrix is stored in the dir:

/NAS_II/Projects/MVPA_Language/lexical/${subj}/stats_images_fmri

each with the name:

${data}_sexemplar_beta_tstat_matrix_${mask}.mat

where

${data} = 'animtool', 'mamnonmam', 'allmamnonmam'

${mask} = 'lh_mask_vtc2', 'mask_vtc2', 'gray_mask2'

The work flow and the data for each step

Rawdata --> whole scan datamatrix --> task-specific data matrix --> sweep-parameter classification results --> best-parameter classification result for each subject

Raw data --->

some code, I don't remember

---> whole scan datamatrix --->

/mnt/home/kittipat/Dropbox/random_matlab_code/fMRI/lexical_project/rdm/

make_datamatrix_allmamnonmam.m

make_datamatrix_animtool.m

make_datamatrix_mamnonmam.m

---> task-specific data matrix --->

/mnt/home/kittipat/Dropbox/random_matlab_code/fMRI/lexical_project/lr_regularization/

experiments_animtool_4fold.m

experiments_mamnonmam_4fold.m

experiments_animtool_lou.m

experiments_mamnonmam_lou.m

---> task-specific data matrix --->

/mnt/home/kittipat/Dropbox/random_matlab_code/fMRI/lexical_project/lr_regularization/

summarize_out_4fold.m

summarize_out_4fold.m (not completed yet)

---> the best results from a selected criterion

More details regarding the stored files.

raw data

dir

file

raw data: the nifti file containing beta coefficients for each observation.

/NAS_II/Projects/MVPA_Language/lexical/${subject}/stats_images_fmri/{run,srun}${run_id}_${data_level}

subject = {3211, 3402, 3424, ...}

run_id = {1,2,3,4}

run = regular run

srun = "smoothened"

data_level = {exemplar,item}

${exemplar_name}_{zstat,beta}.nii.gz

whole scan data matrix

dir

file

whole scan data matrix: the n by m data matrix, where n and m are # of observations and # of features respectively.m is actually the number of voxels for the whole scan. Each row of the data marix represents the brain response to that particular stimulus presented to the subject.

/NAS_II/Projects/MVPA_Language/lexical/${subject}/stats_images_fmri/

filtered_${data_level}_beta_tstat_matrix.mat

data_level = {exemplar, sexemplar, item, sitem}

task-mask-specific data matrix

dir

file

task-mask-specific data matrix: this data matrix is trimmed by the choice of specific task, e.g., "animals vs tools" or "mammals vs non-mammals", and the choice of mask, e.g., "lh_mask_vtc2", "mask_vtc2", "gray_masks". So, this is more like a submatrix of the whole-scan data.

/NAS_II/Projects/MVPA_Language/lexical/${subject}/stats_images_fmri/

${task}_${data_level}_beta_tstat_matrix_${mask}.mat

task = {animtool, mamnonmam, allmamnonmam}

data_level = {exemplar, sexemplar, item, sitem}

mask = {lh_mask_vtc2, mask_vtc2, gray_mask2}

sweep-parameter classification results

dir

file

sweep-parameter classification results: Here we report the classification results, which includes accuracy for train/validation/test set for each pair of parameter lambda and alpha, # non-zero coefficients, etc. You can get the optimal parameters and the best accuracy from here.

/NAS_II/Projects/MVPA_Language/lexical/${subject}/sparse_LR

${subject}_${mask}_combined_${task}_${cv_type}.mat

cv_type = {lou,4fold}

best-parameter classification result for each subject

dir

file

best-parameter classification result for each subject: we pick the optimal parameters from the previous process with any criterion we like. The result here is for the optimal parameter for each subject.

/NAS_II/Projects/MVPA_Language/lexical/${subject}/sparse_LR

${subject}_${mask}_${classifier_regu}_${criterion}_${task}_${cv_type}.mat

classifier_regu = {lr_lasso, lr_ridge, lr_none, lr_elnet}

criterion = {cri1, cri2} // criterion#1, #2

Experimental data produced

Each data can be treated by 2 ways:

4fold = 4-fold cross validation, sweep over alpha, lambda. Report only one best accuracy. Sweeping lambda = 0 --> 1, alpha = 0-->1. We evaluate train, validation and test.
- file: ${subj}_${mask}_combined_${type}_4fold
lou = leave-one-out, sweep over alpha, lambda
- file: ${subj}_${mask}_combined_${type}_lou

Official results#1: 4-fold cross validation

(using the code summarize_out_4fold.m)

There are 2 experiments:

animals vs tools
mammals vs nonmammals

Each reported on 3 masks

lh_mask_vtc2
mask_vtc2
gray_mask2

There are 3 types of classifiers used in this experiment:

LR+lasso
LR+ridge
LR+elastic net
LR alone without regularization

For optimal parameter selection, I use 2 criteria:

criterion#1: pick the parameter (lambda, alpha) based on the maximum validation accuracy alone. However, the consistence of voxels across 4 folds using this criterion is not desirable, so I propose using another criterion.
criterion#2: pick the parameter (lambda, alpha) based on 1) the train accuracy must greater than 0.85, 2) the number of nonzeros coefficients (selected voxels) must greater than 20 3) pick the parameters where the validation accuracy is greatest 4) Use alpha and lambda as the tie-breaker, the greater is preferable as it's more sparse--> greater interpretability.

The report results include

The optimally selected parameters alpha* lambda* for 2 criteria.
The accuracy for train, validation and test set: acc_train, acc_valid, acc_test
The sparsity in terms of number of non-zero coefficients
The consistency of selected voxels across 4 folds in the same subject. The consistency can be measured by how many times it is included in each fold. There are 4 levels:
- #intersect>0: number of voxels that appears at least in one fold
- #intersect>0.25: number of voxels that appears at least 1/4 of the folds
- #intersect>0.5: number of voxels that appears at least 1/2 of the folds
- #intersect>0.75: : number of voxels that appears at least 3/4 of the folds
- #intersect=1: number of voxels that appears in every fold.
- #intersect=0: number of voxels that never appears in any fold.

Experimental results on l...g 4-fold cross validation

The data log for post-processing data for each subject (run using the code summarize_out_4fold.m).

best results for each subject

Correlation of the posterior

correlation of posterior

The summary table for lexical data

summary_lexical

The summary bar plot when averaging across subjects.

Official results#2: leave-one-observation-out cross validation

(using the code summarize_out_lou.m)

There are 2 experiments:

animals vs tools
mammals vs nonmammals

Each reported on 3 masks

lh_mask_vtc2
mask_vtc2
gray_mask2

There are 3 types of classifiers used in this experiment:

LR+lasso
LR+ridge
LR+elastic net
LR alone without regularization

For optimal parameter selection, I use 2 criteria:

criterion#1: pick the parameter (lambda, alpha) based on the maximum validation accuracy alone. However, the consistence of voxels across 4 folds using this criterion is not desirable, so I propose using another criterion.
criterion#2: pick the parameter (lambda, alpha) based on 1) the train accuracy must greater than 0.85, 2) the number of nonzeros coefficients (selected voxels) must greater than 20 3) pick the parameters where the validation accuracy is greatest 4) Use alpha and lambda as the tie-breaker, the greater is preferable as it's more sparse--> greater interpretability.

The report results include

The optimally selected parameters alpha* lambda* for 2 criteria.
The accuracy for train, validation and test set: acc_train, acc_valid, acc_test
The sparsity in terms of number of non-zero coefficients
The consistency of selected voxels across 4 folds in the same subject. The consistency can be measured by how many times it is included in each fold. There are 4 levels:
- #intersect>0: number of voxels that appears at least in one fold
- #intersect>0.25: number of voxels that appears at least 1/4 of the folds
- #intersect>0.5: number of voxels that appears at least 1/2 of the folds
- #intersect>0.75: : number of voxels that appears at least 3/4 of the folds
- #intersect=1: number of voxels that appears in every fold.
- #intersect=0: number of voxels that never appears in any fold.

Experimental results on l... data using leave-one-out