Research‎ > ‎

(2014) mod/mouse/humanENCODE: Blacklisted genomic regions for functional genomics analysis


What are these tracks?

Functional genomics experiments based on next-gen sequencing (e.g. ChIP-seq, MNase-seq, DNase-seq, FAIRE-seq) that measure biochemical activity of various elements in the genome often produce artifact signal in certain regions of the genome. It is important to keep track of and filter artifact regions that tend to show artificially high signal (excessive unstructured anomalous reads mapping). Below is a list of comprehensive empirical blacklists identified by the ENCODE and modENCODE consortia. Note that these blacklists were empirically derived from large compendia of data using a combination of automated heuristics and manual curation. These blacklists are applicable to functional genomic data based on short-read sequencing (20-100bp reads). These are not directly applicable to RNA-seq or any other transcriptome data types. The blacklisted regions typically appear uniquely mappable so simple mappability filters do not remove them. These regions are often found at specific types of repeats such as centromeres, telomeres and satellite repeats. It is especially important to remove these regions that computing measures of similarity such as Pearson correlation between genome-wide tracks that are especially affected by outliers.

Downloads


NEW: VERSION 3 (05/20/2020)
For other species you can use the Version 2 blacklists at https://github.com/Boyle-Lab/Blacklist/tree/master/lists

VERSION 2 (06/28/2019)

Version 2 of blacklists are at https://github.com/Boyle-Lab/Blacklist/tree/master/lists . These have NOT been manually curated

VERSION 1 (These are now deprecated and replaced by v2 above)
Blacklist for various species and genome versions can be downloaded from here http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/ 

Who generated these tracks?

The original v1 human hg19 integrated blacklist was generated in 2011 by Anshul Kundaje as part of the ENCODE (phase2) project. It was curated manually.
The worm, fly and mouse blacklists and the GRCh38 human blacklist were generated by Alan Boyle and Anshul Kundaje as part of the ENCODE and modENCODE projects.

How should I cite these tracks?

If you use these tracks in any work please cite

  • Amemiya HM, Kundaje A, Boyle AP. The ENCODE blacklist: identification of problematic regions of the genome. Sci Rep. 2019 Dec; 9(1) 9354 DOI: 10.1038/s41598-019-45839-z 
  • ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.
You can also check out this paper http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3989762/  uses the human blacklist to examine artifacts in ChIP-seq and ChIP-exo data. However, please DO NOT cite it as the primary source of the blacklist. 

Comments