These are papers/reviews/tutorials that I am reading or have enjoyed reading. The list below is a bit dated. We now keep track of interesting papers and publications via Mendeley. Private paper repo: https://klab-papers.herokuapp.com/ |

### Interesting Papers

Tag | Topic/Name | URL/Link | Type | Description | Year |
---|---|---|---|---|---|

Bayesian Networks | Great List of papers on Bayesian Learning | http://cocosci.berkeley.edu/tom/bayes.html | Review | Great List of papers on Bayesian Learning | January 12, 2010 |

Machine Learning | Measuring and testing dependence by correlation of distances | http://projecteuclid.org/euclid.aos/1201012979 | Paper | A new distance metric [0,1] that measure dependence between two vectors. It takes into account non-linear dependencies and is only 0 if the two vectors are independent | January 25, 2007 |

Computational Biology | Enhancing scatterplots with smoothed densities | http://bioinformatics.oxfordjournals.org/cgi/content/abstract/20/5/623 | Paper | Plotting high density scatter plots with smoothing and transparency | January 16, 2003 |

Computational Biology | How does multiple testing correction work? | http://www.nature.com/nbt/journal/v27/n12/full/nbt1209-1135.html | Review | Multiple hypothesis testing and correction in biology | January 7, 2010 |

Computational Biology | Simcluster: clustering enumeration gene expression data on the simplex space | http://www.biomedcentral.com/1471-2105/8/246 | Paper | Clustering of gene expression, uses Aitchisonean distance metric which useful for any data that lives in simplex space (example probabilities or data that sums to a constant) | December 27, 2009 |

Biology | Sequencing technologies — the next generation | http://www.nature.com/nrg/journal/vaop/ncurrent/abs/nrg2626.html | Review | Review on next generation sequencing | December 17, 2009 |

Computational Biology | ARTS: Accurate Recognition of Transcription Starts in Human | http://www.fml.tuebingen.mpg.de/raetsch/suppl/arts | Paper | Multiple string kernels with SVMs for TSS prediction | November 16, 2006 |

Machine Learning | Clustering with shallow trees | http://arxiv.org/abs/0910.0767# | Paper | Clustering method that is intermediary between single linkage hierarchical clustering and affinity propagation | November 16, 2009 |

Biology | ChIP–seq: advantages and challenges of a maturing technology | http://www.nature.com/nrg/journal/v10/n10/full/nrg2641.html | Review | Review paper on ChIP-seq and its applications | September 26, 2009 |

Computational Biology | High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites | http://nar.oxfordjournals.org/cgi/content/full/37/1/14 | Paper | Integration of chromatin mark data improves TFBS prediction | September 22, 2009 |

Machine Learning | Deep Belief Networks | http://www.iro.umontreal.ca/~lisa/publications/index.php?page=publication&kind=single&ID=209 | Review | Review by Yoshua Bengio on Deep Belief Networks | September 21, 2009 |

Computational | Dendroscope | http://www-ab.informatik.uni-tuebingen.de/software/dendroscope/welcome.html | Software | Software for visualizing massive networks and trees | September 19, 2009 |

Machine Learning | Lasso, Elastin net and Ridge regression code by Friedman, Tibshirani, Hasti | http://www-stat.stanford.edu/~tibs/glmnet-matlab/ | Software | MATLAB and R code (glmnet package) | September 8, 2009 |

Computational Biology | BedTools: utilities for comparing genomic features in BED format | http://people.virginia.edu/~arq5x/bedtools.html | Software | BedTools: utilities for comparing genomic features in BED format | September 1, 2009 |

Machine Learning | VOWPAL WABBIT: Sparse online learning via truncated gradient | http://www.research.rutgers.edu/~lihong/pub/Langford09Sparse-JMLR.pdf | Paper | Very fast online learning | August 24, 2009 |

Biology | Long noncoding RNAs: functional surprises from the RNA world | http://genesdev.cshlp.org/content/23/13/1494.short?rss=1 | Review | review on long non-coding RNAs | July 30, 2009 |

Boosting | ASSEMBLE: Exploiting Unlabeled Data in Ensemble Methods | http://www.rpi.edu/~bennek/kdd-KristinBennett1.pdf | Paper | Semi supervised boosting | July 18, 2002 |

Machine Learning | Review on semi supervised learning | http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html | Review | Review on semi supervised learning | July 23, 2009 |

Boosting | Entropy regularized boosting tutorial | http://www.cse.ucsc.edu/~manfred/pubs/tut/icml2009/micml.pdf | Review | Manfred Warmuth's talk on Entropy Regularized Boosting | July 14, 2009 |

Machine Learning | Tutorial on Machine Learning reductions | http://hunch.net/~reductions_tutorial/ | Review | How to convert one type of learning problem into another | July 14, 2009 |

Machine Learning | Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions | http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1423975 | Review | Good review on recommendation systems, collaborative filtering and the linke | July 30, 2005 |

Machine Learning | Network-constrained regularization and variable selection for analysis of genomic data | http://bioinformatics.oxfordjournals.org/cgi/content/full/24/9/1175 | Paper | Network contrained regularized regression | July 7, 2009 |

Computational Biology | From DNA sequence to transcriptional behaviour: a quantitative approach | http://www.nature.com/nrg/journal/v10/n7/abs/nrg2591.html | Review | Transcription, Sequence and nucleosome positioning. Review by Eran Segal | June 27, 2009 |

Biology | Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes | http://genesdev.cshlp.org/content/23/12/1379.full | Review | Next gen sequencing of transcriptomes | June 21, 2009 |

Machine Learning | Measuring classifier performance: a coherent alternative to the area under the ROC curve | http://www.springerlink.com/content/y35743hp7010g354/ | Paper | An alterative to AUC to measure classifier performance | June 21, 2009 |

Computational Biology | Analytical methods for inferring functional effects of single base pair substitutions in human cancers | http://www.springerlink.com/content/c86418m69u475231/ | Review | Inferring functions from mutations in cancer | June 16, 2009 |

Machine Learning | Active learning tutorial | http://hunch.net/~active_learning/ | Review | Active learning tutorial | June 15, 2009 |

Machine Learning | Learning Nonlinear Dynamic Models | http://arxiv.org/abs/0905.3369 | Paper | A different approach for learning HMM/DBN type models | June 12, 2009 |

Computational | GNU Linear programming library | http://www.gnu.org/software/glpk/ | Software | GNU Linear programming library | June 9, 2009 |

Machine Learning | The Entire Regularization Path for the Support Vector Machine | http://www.jmlr.csail.mit.edu/papers/volume5/hastie04a/hastie04a.pdf | Paper | How to efficiently search the space of regularization parameter C for an SVM | June 9, 2009 |

Computational Biology | Genome-wide association analysis by lasso penalized logistic regression | http://bioinformatics.oxfordjournals.org/cgi/content/full/25/6/714 | Paper | When the number of features is >> number of training examples this is a good methodology to try | June 9, 2009 |

Boosting | Topics in Regularization and Boosting | http://www-stat.stanford.edu/~hastie/THESES/saharon_rosset.pdf | Review | Great thesis on various types of regularization in boosting and SVMs | June 9, 2009 |

Machine Learning | Grafting: fast, incremental feature selection by gradient descent in function space | http://portal.acm.org/citation.cfm?id=944976 | Paper | The regularization term can be used as a way to figure out the stop feature selection/stopping criterion for boosting | March 19, 2003 |

Biology | Deep cap analysis gene expression (CAGE) | http://www.biotechniques.com/biotechniques/multimedia/archive/00003/BTN_A_000112802_O_3724a.pdf | Review | Description of Deep CAGE technology for identification of TSS | May 28, 2009 |

Biology | Fundamental concepts in genetics | http://www.nature.com/nrg/series/fundamental/index.html | Review | Nature Review papers on genetics | May 26, 2009 |

Biology | Genetic Mapping in Human Disease | http://www.sciencemag.org/cgi/content/full/322/5903/881 | Review | Review on genome wide association studies by David Altschuler | May 27, 2008 |

Computational Biology | Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields | http://bioinformatics.oxfordjournals.org/cgi/content/full/25/10/1307 | Paper | L1 regularized optimization models for CNV (Rob Schapire) | May 24, 2009 |

Computational Biology | Statistical Inference in mRNA-Seq: Exploratory Data Analysis and Differential Expression | http://www.bepress.com/ucbbiostat/paper247/ | Paper | mRNA-seq data normalization and differential expression | May 14, 2009 |

Computational | Probabilistic inference using MCMC methods | http://www.cs.toronto.edu/~radford/ftp/review.pdf | Review | MCMC, Gibbs sampling and other sampling methods | September 27, 1993 |

Boosting | BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING | http://ftp://ftp.stat.math.ethz.ch/Research-Reports/Other-Manuscripts/buhlmann/BuehlmannHothorn_Boosting-rev2.pdf | Review | A great statistical review of boosting (regression and classification) | June 4, 2007 |

Machine Learning | VFML (Very Fast Machine Learning) toolkit | http://www.cs.washington.edu/dm/vfml/ | Software | VFML (Very Fast Machine Learning) toolkit for very fast online learning with decision trees and bayesian learning | April 18, 2009 |

Computational | On Estimation of a Probability Density Function and Mode | http://projecteuclid.org/euclid.aoms/1177704472 | Paper | Kernel density estimation | May 28, 1962 |

Machine Learning | Modification of Correlation Kernels in SVM, KPCA and KCCA in Texture Classification | http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=01556208 | Paper | Various kernels for sequence/waveform data | May 8, 2009 |

Machine Learning | Pattern Recognition Using Higher-Order Local Autocorrelation coefficients | http://www.google.com/search?q=Pattern+Recognition+Using+Higher-Order+Local+Autocorrelation+coefficients | Paper | Efficient computation of higher order cross-correlation kernels | June 24, 2002 |

Machine Learning | Comparison of Combining Methods of Correlation Kernels in kPCA and kCCA for Texture Classification with Kansei Information | http://www.springerlink.com/index/804l53602706185l.pdf | Paper | Various kernels for sequence waveform data | May 28, 2007 |

Machine Learning | Signal Theory for SVM Kernel Design with applications to parameter estimation and sequence kernels | http://eprints.ecs.soton.ac.uk/15121/1/paper.ps | Paper | Kernels for sequences and waveform signals | May 7, 2009 |

Machine Learning | Computing a nearest symmetric positive definite matrix | http://www.maths.manchester.ac.uk/~nareports/narep126.pdf | Paper | At times a matrix is not symmetric positive definite. This paper explains how to get the nearest psd matrix. Useful for kernel computations. | May 17, 1988 |

Computational | Notes on Functionals and Functional Derivatives | http://julian.tau.ac.il/~bqs/functionals/functionals.html | Review | Useful for understanding functional gradient descent | |

Boosting | mBoost package documentation | http://cran.okada.jp.org/web/packages/mboost/mboost.pdf | Software | Documentation of the mBoost R package by Peter Buhlmann | May 2, 2009 |

Teaching | 10 simple rules to mix teaching with research | http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000358 | Review | April 27, 2009 | |

Machine Learning | Apache Mahout | http://lucene.apache.org/mahout/ | Software | MapReduce based Machine Learning implementation | April 18, 2009 |

Machine Learning | IBM Parallel Machine Learning Toolbox | http://www.alphaworks.ibm.com/tech/pml | Software | Kmeans, SVM paralellized, NOT open source | April 18, 2009 |

Computational Biology | Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes | http://www.nature.com/nrg/journal/v9/n4/full/nrg2185.html | Review | Review on comparative sequence analysis | April 16, 2008 |

Machine Learning | A kernel for time series based on global alignments | http://arxiv.org/PS_cache/cs/pdf/0610/0610033v1.pdf | Paper | Kernels for time series data that is not phased (synchronized) | October 2, 2006 |

Machine Learning | LibSVM: A Library for Support Vector Machines | http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf | Software | Great documentation on implementation details of various types of SVMs for classification, regression, density estimation etc. | April 16, 2009 |

Machine Learning | Analysis of Switching Dynamics with Competing Support Vector Machines | http://www.csie.ntu.edu.tw/~cjlin/papers/ijcnntime.pdf | Paper | Weighted SVMs for segmentation of mixed signals | |

Machine Learning | Cost-Sensitive Learning by Cost-Proportionate Example Weighting | http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fhunch.net%2F~jl%2Fprojects%2Freductions%2Fcosting%2FfinalICDM2003.pdf&ei=zvXmScbcA5ectAO35OXnAQ&usg=AFQjCNFvXBQ2pffOG7g_8x76HrmunUiQ8A&sig2=BQ8LYVNIrz_bsCRYlnkpYw | Paper | Cost sensitive learning - includes the fabled weighted SVM | April 17, 2003 |

Machine Learning | Map-Reduce for Machine Learning on Multicore | http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf | Paper | Parallelization of machine learning algorithms | October 10, 2006 |

Computational Biology | Software package for primary analysis of Illumina next gen sequencing assays | http://sgenomics.org/swift/ | Software | Highly parallelized C++ for primary data analysis of second gen sequencing assays | January 24, 2009 |

Computational Biology | SNP imputation in association studies | http://www.nature.com/nbt/journal/v27/n4/abs/nbt0409-349.html | Review | Eran Halperin's review on the use of SNPs and Haplotypes for association studies PART 2 | April 13, 2009 |

Computational Biology | Maximizing power in association studies | http://www.nature.com/nbt/journal/v27/n3/full/nbt0309-255.html | Review | Eran Halperin's review on genome wide association studies PART 1 | April 13, 2009 |

Computational | Convex Optimization | http://www.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf | Review | Book by Stephen Boyd | March 6, 2009 |

Computational Biology | Efficient and accurate P-value computation for Position Weight Matrices | http://www.almob.org/content/2/1/15 | Paper | Thresholds for PWMs based on a p-value cutoff | December 11, 2007 |

Computational | CloudBurst: Highly Sensitive Short Read Mapping with MapReduce | http://apps.sourceforge.net/mediawiki/cloudburst-bio/index.php?title=CloudBurst | Software | Massive parallelization of tag to genome mapping and k-mer manipulation. Based on google's MapReduce and HADOOP | March 18, 2009 |

Boosting | iBoost: Boosting with item set mining | http://www.kyb.mpg.de/bs/people/hiroto/iboost/ | Software | boosting itemsets | January 24, 2009 |

Biology | E2F in vivo binding specificity: Comparison of consensus versus nonconsensus binding sites | http://genome.cshlp.org/content/18/11/1763 | Paper | Discusses TFs that bind sites that do no have consensus motifs | November 13, 2008 |

Machine Learning | Support Vector Regression | http://cs.ecs.baylor.edu/~hamerly/courses/5325_08s/papers/svm/smola2004regression.pdf | Review | Tutorial on Support vector regression | November 28, 2008 |

Boosting | Gboost: Graph boosting | http://www.kyb.mpg.de/bs/people/nowozin/gboost/ | Software | Code for boosting with graph mining | January 24, 2009 |

Bayesian Networks | Graphical Models, Exponential Families, and Variational Inference | http://www.nowpublishers.com/product.aspx?product=MAL&doi=2200000001 | Review | Extensive review on graphical models | February 25, 2009 |

Biology | Nucleosome positioning and gene regulation: advances through genomics | http://www.nature.com/nrg/journal/v10/n3/full/nrg2522.html | Review | Great review on the effect of nucleosome positioning on gene regulation | February 21, 2009 |

Computational | Complexity of Finite Functions | http://www.cs.columbia.edu/~rocco/Teaching/S09/6998/Boppana-Sipser-complexity.ps | Review | Excellent review paper on computational complexity by Bopanna and Sipser | August 15, 1989 |

Computational | Extremal Combinatorics | http://lovelace.thi.informatik.uni-frankfurt.de/~jukna/EC_Book/index.html | Review | Great book on Advanced topics in computational complexity | February 16, 2009 |

Computational Biology | CoreBoost_HM | http://www.ncbi.nlm.nih.gov/pubmed/18997002 | Paper | Boosting to predict TSS using sequence + chromatin mod data | January 6, 2009 |

Computational Biology | CoreBoost | http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1852414 | Paper | Boosting to predict TSS | February 7, 2009 |

Computational | NoteBooks | http://cscs.umich.edu/~crshalizi/notebooks/ | Review | Great set of links to reading material for over 400 topics | February 5, 2009 |

Machine Learning | Olivier Bousquet, Stéphane Boucheron and Gábor Lugosi, "Introduction to Statistical Learning Theory" | http://www.stat.cmu.edu/~larry/=sml2008/BBL.pdf | Review | Review of Statistical Learning Theory | February 5, 2009 |

Machine Learning | Survey on active learning | http://pages.cs.wisc.edu/~bsettles/active-learning | Review | January 24, 2009 | |

Computational | MATLAB CVX package for convex optimization | http://www.stanford.edu/~boyd/cvx/ | Software | MATLAB CVX package for convex optimization | January 24, 2009 |

Computational Biology | Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data | http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000231 | Paper | Predicting phenotypic traits from SNP data. Try boosting on it. | November 23, 2008 |

Computational Biology | Activity motifs reveal principles of timing in transcriptional control of the yeast metabolic network | http://www.nature.com/nbt/journal/v26/n11/abs/nbt.1499.html | Paper | Potential project for graph boosting | November 23, 2008 |

Computational Biology | A novel method for comparing topological models of protein structures enhanced with ligand information | http://bioinformatics.oxfordjournals.org/cgi/content/short/24/23/2698?rss=1 | Paper | Protein representation | November 21, 2008 |

Machine Learning | Random Forests | http://www.springerlink.com/content/u0p06167n6173512/ | Paper | Bootstrap based method for creating regression and classification trees | November 5, 2004 |

Computational Biology | Bowtie: Ultra fast short read aligner | http://bowtie-bio.sourceforge.net/ | Software | Fast alignment of tags to genomes using indexing | November 5, 2008 |

Computational | Reducing the Space Requirement of Suffix Trees | http://www.zbh.uni-hamburg.de/staff/kurtz/papers/Kur1999.pdf | Paper | How to implement suffix trees efficiently | November 26, 1999 |

Computational Biology | MUMmer: Utlra fast genome aligner | http://mummer.sourceforge.net/ | Software | Very fast sequence matching and aligning | November 5, 2008 |

Computational Biology | SeqAn: C++ sequence library | http://www.seqan.de | Software | C++ library for sequence manipulation | November 5, 2008 |

Boosting | Gradient Tree Boosting for Training Conditional Random Fields | http://jmlr.csail.mit.edu/papers/v9/dietterich08a.html | Paper | sequence labeling method | November 4, 2008 |

Boosting | The boosting approach to machine learning: An overview | http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/msri.ps | Review | Introductory review on boosting | November 29, 2003 |

Boosting | An introduction to boosting and leveraging | http://www.ee.technion.ac.il/~rmeir/Publications/MeiRae03.pdf | Review | Detailed review on Boosting and ensemble methods | November 29, 2003 |

Computational Biology | Boolean implication networks derived from large scale, whole genome microarray datasets | http://genomebiology.com/2008/9/10/R157 | Paper | Extracting boolean implications from microarray data, Could be used as a useful pre-processing before learning | |

Computational Biology | Extracting binary signals from microarray time-course data | http://nar.oxfordjournals.org/cgi/content/full/gkm284v1 | Paper | Simple method for discretization of microarray data (mostly time course data or data that spans a large dynamic range per gene) | May 1, 2007 |

Machine Learning | Lease Angle and L1 Regression: a Review | http://arxiv.org/pdf/0802.0964 | Review | An interesting new method for regression | October 27, 2008 |

Boosting | Sparse Boosting | http://jmlr.csail.mit.edu/papers/volume7/buehlmann06a/buehlmann06a.pdf | Paper | A Boosting technique for regression | October 11, 2006 |

Boosting | Improved Boosting Algorithms Using Confidence-rated Predictions | http://www.springerlink.com/content/k8134wq0824k7042/ | Paper | Excellent paper for efficient implementation of Adaboost and variants (such as abstaining) | December 30, 1999 |

Computational | Conjugate gradient method | http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf | Review | Extremely lucidly explained tutorial on the Conjugate gradient method | August 4, 1994 |

Machine Learning | A tutorial introduction to the minimum description length principle | http://arxiv.org/abs/math/0406077 | Review | Review of the MDL principle | June 4, 2004 |

Computational | Compressive Sensing | http://igorcarron.googlepages.com/cs | Review | A great site on the methods of compressive sensing (a method for compression and transfer of information) | October 27, 2008 |

Showing 97 items