Research‎ > ‎

### (2012) Unique mappability tracks for several species

I have generated per-base unique mappability tracks for a large range of read lengths for several key species.

Each directory corresponds to a particular assembly of a species and contains a file that is named globalmap_k<min>tok<max>.tgz file, The tar.gz file when unzipped will unzip to a directory called global_k<min>tok<max> which will contain C binary files representing unique mappability for each chromosome c \in C. Each track simultaneously encodes for mappability at all read lengths from <min> to <max>
    (a) The files are in uint8 (unsigned 8 bit integers) binary formats
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=<min> to <max>)
(d) In order to obtain the uniqueness map for a particular k, simply perform the following operation on the vector (vector > 0) & (vector <= k)
(d) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <len-1>.
i.e. if position 1 is UNIQUE on the + strand for <len=3> then position 3 is UNIQUE on the - strand
Example UsageHow to read the files in matlab
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome

tmp_uMap = fopen('chr1.uint8.unique','r');