Research‎ > ‎

(2014) ENCODE: Histone modification ChIP-seq uniform peak calls

Defining functional DNA elements in the human genome.
Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov GK, Ward LD, Birney E, Crawford GE, Dekker J, Dunham I, Elnitski LL, Farnham PJ, Feingold EA, Gerstein M, Giddings MC, Gilbert DM, Gingeras TR, Green ED, Guigo R, Hubbard T, Kent J, Lieb JD, Myers RM, Pazin MJ, Ren B, Stamatoyannopoulos JA, Weng Z, White KP, Hardison RC.
Proc Natl Acad Sci U S A. 2014 Apr 23.

Histone modification ChIP-seq datasets were processed to identify regions of ChIP enrichment relative to corresponding sequenced input-DNA controls. Read alignment files were filtered to discard multi-mapping reads and duplicates.

We used the MACS2 peak caller (v 2.0.10.20130712) to identify regions of enrichment over a wide range of signal strength. Enriched regions were scored on individual replicates, pooled data (reads pooled across replicates) and on subsampled pseudoreplicates (obtained by pooling reads from all replicates and randomly subsampling, without replacement, two pseudoreplicates with half the total number of pooled reads).

We used MACS2 to identify three types of regions of enrichment: (i) narrow peaks of contiguous enrichment (narrowPeaks) that pass a Poisson p-value threshold of 0.01; (ii) broader regions of enrichment (broadPeaks) that pass a Poisson p-value threshold of 0.1 (using MACS2’s broad peak mode); (iii) gapped/chained regions of enrichment (gappedPeaks) defined as broadPeaks that contain atleast one strong narrowPeak.

In order to obtain reliable regions of enrichment, we restricted to enriched regions identified using pooled data that were also independently identified in both pseudoreplicates. The coverage and conservation analysis only used histone modification datasets from the Broad Institute Production group.  We used the gappedPeak representation for the histone marks with relatively compact enrichment patterns. These include H3K4me3, H3K4me2, H3K4me1, H3K9ac, H3K27ac and H2A.Z.

For the diffused histone marks, H3K36me3, H3K79me2, H3K27me3, H3K9me3 and H3K9me1, we used the broadPeak representation. These peak calls were not optimally thresholded by design so as to allow for analysis of genomic coverage over a wide range of signal enrichment.

The gappedPeak and broadPeak files can be downloaded from http://www.broadinstitute.org/~anshul/projects/encode/rawdata/peaks_histone/mar2012/broad/combrep_and_ppr/
The narrowPeak files (not used in any of the analyses) can be downloaded from http://www.broadinstitute.org/~anshul/projects/encode/rawdata/peaks_histone/mar2012/narrow/combrep_and_ppr/
Negative log10 of Poisson p-values of enrichment present in Column 8 of the peak files were used as scores for the peaks in the coverage analysis.

Additional details and step-by-step instructions coming soon ..

Comments