Research‎ > ‎

(2013) Chromatin State Variation across humans


Extensive variation in chromatin states across humans.

Kasowski M*, Kyriazopoulou-Panagiotopoulou S*, Grubert F*, Zaugg JB*, Kundaje A*, Liu Y, Boyle AP, Zhang QC, Zakharia F, Spacek DV, Li J, Xie D, Olarerin-George A, Steinmetz LM, Hogenesch JB, Kellis M, Batzoglou S, Snyder M.
Science. 2013 Nov 8;342(6159):750-2. doi: 10.1126/science.1242510. Epub 2013 Oct 17.

The majority of disease-associated variants lie outside protein-coding regions, suggesting a link between variation in regulatory regions and disease predisposition. We studied differences in chromatin states using five histone modifications, cohesin, and CTCF in lymphoblastoid lines from 19 individuals of diverse ancestry. We found extensive signal variation in regulatory regions, which often switch between active and repressed states across individuals. Enhancer activity is particularly diverse among individuals, whereas gene expression remains relatively stable. Chromatin variability shows genetic inheritance in trios, correlates with genetic variation and population divergence, and is associated with disruptions of transcription factor binding motifs. Overall, our results provide insights into chromatin variation among humans.

Raw data

FASTQ files for ChIP-seq and RNA-seq data

Processed Data

All processed data are available at http://gbsc-share.stanford.edu/chromovar/rawdata/

  • alleleCounts: Information about allele specific SNPs
  • anova: Results of the ANOVA analysis for detecting regions with variable signal.
  • deseq2: Results of DESeq analysis for detecting regions with differential signal levels.
  • genomes: Fasta files with maternal and paternal genomes for all non-San individuals in the study. Genotype calls and BAM files with mapped sequencing reads for the San individuals are available through a data access agreement for transfer of genetic data by contacting the authors.
  • mapped: Mapped ChIP-seq and RNA-seq reads.
  • metadata: Information about the cell lines and experiments.
  • motifs: Data for correlating H3K27Ac signal with motif disruptions.
  • peakFiles: Peak calls.
  • popSpecific: Regions with signal showing patterns specific to ancestry groups.
  • signal: Genome-wide signal from the ChIP-seq experiments, as well as average signal on the consensus sets of peaks.
  • transcriptomes: Personal transcriptomes of the non-San individuals and gene annotations.
For the SAN individuals Genotype calls and BAM files with mapped sequencing reads for the San individuals are available through a data access agreement for transfer of genetic data by contacting Michael Snyder.

The list of individuals and their identifiers

CELLTYPE RELATIONSHIP ANCESTRY GENDER ID
12878 Daughter CEU Female C01
12891 Father CEU Male C02
12892 Mother CEU Female C03
19238 Mother YRI Female C04
19239 Father  YRI Male C05
19240 Daughter YRI Female C06
10847 Female CEU Female C07
18505 Female YRI Female C08
18526 Female Han Chinese Female C09
18951 Female Japanese Female C10
19099 Female YRI Female C11
19193 Female YRI Female C12
18486 Male YRI Male C13
Snyder Male CEU Male C14
12890 Female CEU Female C15
2255 Male San Male C16
2588 Male San Male C17
2610 Male San Male C18
2630 Male San Male C19

Chromatin State Maps

ChromHMM was used to learn combinatorial chromatin states jointly across all individuals. The model and resulting chromatin state maps are available here 

Click on the Figure below to see a summary of the 15 state chromatin state model and the enrichments of various known annotations and features (e.g. TF ChIP-seq binding peaks from the GM12878 line from ENCODE) in the various states.

The states are as follows

STATE NO. MNEMONIC DESCRIPTION COLOR NAME COLOR CODE
1 TssA Active TSS Red 255,0,0
2 TssF Flanking Active TSS Orange Red 255,69,0
3 Tx Strong transcription Green 0,128,0
4 TxWk Weak transcription DarkGreen 0,100,0
5 EnhA Acetylated Active Enhancer Orange 255,165,0
6 TxEnhA Acetylated Active Enhancer (Genic) Orange 255,165,0
7 Enh Active Enhancer Gold 255,215,0
8 TxEnh Active Enhancer (Genic) Gold 255,215,0
9 EnhW Weak Enhancer Yellow 255,255,0
10 TxEnhW Weak Enhancer (Genic) Yellow 255,255,0
11 TssP Bivalent TSS DarkSalmon 233,150,122
12 EnhP Bivalent Enhancer DarkKhaki 189,183,107
13 ReprPC Repressed PolyComb Grey 105,105,105
14 Ctcf CTCF only Black 0,0,0
15 Low Quiescent/Low White 255,255,255

The following result files would be useful to most

(1) MNEMONICS BED FILES
Files name C[ID].[CellLine]_15_Core_mnemonics.bed.gz
- Tab delimited 4 columns
- chromosome, start (0-based), stop (1-based), state_label_mnemonic for that region
You can download an archive containing all the mnemonics.bed files from
http://www.broadinstitute.org/~anshul/projects/chromatinVariation/segmentations/chmmResults/14indivCore/final/all.mnemomics.bed.tgz

(2) BROWSER FRIENDLY FILES
Files named C[ID].[CellLine]_15_Core_dense.bb
The dense BIGBED files will allow you to view each individual's chromatin state map as a single track with regions labeled with state mnemonics and representative colors. You can stream these to UCSC Genome Browser or IGV
*.bb files in http://www.broadinstitute.org/~anshul/projects/chromatinVariation/segmentations/chmmResults/14indivCore/final/

Files named C[ID].[CellLine]_15_Core_dense.bed.gz
Same as above except in text format
You can download an archive containing all the dense BED files from
http://www.broadinstitute.org/~anshul/projects/chromatinVariation/segmentations/chmmResults/14indivCore/final/all.dense.bed.tgz

Files named C[ID].[CellLine]_15_Core_expanded.bed.gz
The expanded files will allow you to view each individuals chromatin state map with each state as a separate track labeled with state mnemonics and representative colors
You can download an archive containing all the expanded files from
http://www.broadinstitute.org/~anshul/projects/chromatinVariation/segmentations/chmmResults/14indivCore/final/all.expanded.bed.tgz

(3) STATES FOR EACH 200bp BIN
Max. posterior state label for each 200 bp bin in each chromosome for all epigenomes. The difference from the Mnemonic BED files is that in the Mnemonic files contiguous bins with the same state label are merged and a label is assigned to the entire merged regions. Hence in the mnemonic files the regions are of variable sizes. These files below are at a fixed 200 bp resolution.
http://www.broadinstitute.org/~anshul/projects/chromatinVariation/segmentations/chmmResults/14indivCore/final/all.statesByBin.tgz

(4) POSTERIOR PROBABILITY FOR EACH 200bp BIN
Posterior probabilities of each state in each 200 bp bin for all chromosomes in all epigenomes
http://www.broadinstitute.org/~anshul/projects/chromatinVariation/segmentations/chmmResults/14indivCore/final/POSTERIOR/
Comments