Genomic sequence variation1000 Genomes Project Data collection and a catalog of human variation dbSNP A catalog ofSNPs and short indels dbVar and Database of Genomic Variants http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=dgvPlus (browser track) A catalog of structural variants Online Mendelian Inheritance in Man OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. The full-text, referenced overviews in OMIM contain information on all known mendelian disorders and over 12,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain copious links to other genetics resources. http://exac.broadinstitute.org/ ExAC is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 61,486 unrelated individuals sequenced as part of various disease-specific and population genetic studies. We have removed individuals affected by severe pediatric disease, so this data set should serve as a useful reference set of allele frequencies for severe disease studies. All of the raw data from these projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects. gnomAD The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community UK Biobank UK Biobank is a national and international health resource with unparalleled research opportunities, open to all bona fide health researchers. UK Biobank aims to improve the prevention, diagnosis and treatment of a wide range of serious and life-threatening illnesses – including cancer, heart diseases, stroke, diabetes, arthritis, osteoporosis, eye disorders, depression and forms of dementia. It is following the health and well-being of 500,000 volunteer participants and provides health information, which does not identify them, to approved researchers in the UK and overseas, from academia and industry. This dataset requires explicit access permission. However, GWAS summary statistics for several traits and disease are available at http://www.nealelab.is/uk-biobank. Other useful information is available at https://biobankengine.stanford.edu/ Molecular functionEncyclopedia Of DNA Elements (ENCODE) Project The Encyclopedia of DNA Elements (ENCODE) Consortium is an ongoing international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. ENCODE investigators employ a variety of assays and methods to identify functional elements. The discovery and annotation of gene elements is accomplished primarily by sequencing a diverse range of RNA sources, comparative genomics, integrative bioinformatic methods, and human curation. Regulatory elements are typically investigated through DNA hypersensitivity assays, assays of DNA methylation, and immunoprecipitation (IP) of proteins that interact with DNA and RNA, i.e., modified histones, transcription factors, chromatin regulators, and RNA-binding proteins, followed by sequencing. Roadmap Epigenomics Project (NIH Common Fund) http://compbio.mit.edu/roadmap (Uniformly processed data) Data collection, integrative analysis and a resource of human epigenomic data BLUEPRINT Epigenome Data collection on the epigenome of blood cells International Human Epigenome Consortium (IHEC) Data collection and reference maps of human epigenomes for key cellular states relevant to health and diseases. IHEC includes ENCODE, Roadmap, Blueprint and many others. The Epigenome Reference Registry, aka EpiRR, serves as a registry for datasets grouped in reference epigenomes and their respective metadata, including direct links to the raw data in public sequence archives. IHEC reference epigenomes must meet the minimum the criteria listed here and any associated metadata should comply with the IHEC specifications described here. ChIP-Atlas ChIP-Atlas is an integrative and comprehensive database for visualizing and making use of public ChIP-seq data. ChIP-Atlas covers almost all public ChIP-seq data submitted to the SRA (Sequence Read Archives) in NCBI, DDBJ, or ENA, and is based on over 118,000 experiments. CISTROME Uniformly processed collection of ChIP-seq, DNase-seq and ATAC-seq datasets across multiple species ReMap: An integrative ChIP-seq data portal ReMap an integrative analysis of transcriptional regulators ChIP-seq experiments from both Public and Encode datasets. The ReMap atlas consists of 80 million peaks from 485 transcription factors (TFs), transcription coactivators (TCAs) and chromatin-remodeling factors (CRFs). The atlas is available to browse or download either for a given TF or cell line, or for the entire dataset. ARCHS4: Massive Mining of Publicly Available RNA-seq Data from Human and Mouse ARCHS4 provides access to gene counts from HiSeq 2000, HiSeq 2500 and NextSeq 500 platforms for human and mouse experiments from GEO and SRA. The website enables downloading of the data in H5 format for programmatic access as well as a 3-dimensional view of the sample and gene spaces. Search features allow browsing of the data by meta data annotation, ability to submit your own up and down gene sets, and explore matching samples enriched for annotated gene sets. Selected sample sets can be downloaded into a tab separated text file through auto-generated R scripts for further analysis. Reads are aligned with Kallisto using a custom cloud computing platform. Human samples are aligned against the GRCh38 human reference genome, and mouse samples against the GRCm38 mouse reference genome. RECOUNT2: A multi-experiment resource of analysis-ready RNA-seq gene and exon count datasets recount2 is an online resource consisting of RNA-seq gene and exon counts as well as coverage bigWig files for 2041 different studies. It is the second generation of the ReCount project. The raw sequencing data were processed with Rail-RNA as described in the recount2 paper and at Nellore et al, Genome Biology, 2016 which created the coverage bigWig files. For ease of statistical analysis, for each study we created count tables at the gene and exon levels and extracted phenotype data, which we provide in their raw formats as well as in RangedSummarizedExperiment R objects (described in the SummarizedExperiment Bioconductor package). We also computed the mean coverage per study and provide it in a bigWig file, which can be used with the derfinder Bioconductor package to perform annotation-agnostic differential expression analysis at the expressed regions-level as described at Collado-Torres et al, Nucleic Acids Research, 2017. The count tables, RangedSummarizeExperiment objects, phenotype tables, sample bigWigs, mean bigWigs, and file information tables are ready to use and freely available here. We also created the recount Bioconductor package which allows you to search and download the data for a specific study. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward. FANTOM5 Project Large collection of CAGE based expression data across multiple species (time-series and perturbations) Human BodyMap Viewable with Ensemble (http://www.ensembl.org/index.html) or the Integrated Genomics Viewer (http://www.broadinstitute.org/igv/) Gene expression database from Illumina, from RNA-seq data Array Express Database of gene expression experiments Gene Expression Atlas Database supporting queries of condition-specific gene expression on a curated subset of the Array Express Archive. GNF Gene Expression Atlas Viewable at BioGPS (http://biogps.org/#goto=welcome) GNF (Genomics Institute of the Novartis Research Foundation) human and mouse gene expression array data. The Human Protein Atlas Protein expression profiles based on immunohistochemistry for a large number of human tissues, cancers and cell lines, subcellular localization, transcript expression levels UniProt A comprehensive, freely accessible database of protein sequence and functional information InterPro An integrated database of protein classification, functional domains, and annotation (including GO terms). Protein Capture Reagents Initiative Resource generation: renewable, monoclonal antibodies and other reagents that target the full range of proteins Knockout Mouse Program (KOMP) Resource generation: create knockout strains for all mouse genes, Trans-NIH project Cancer CellLine Encyclopedia (CCLE) Gene expression data, CNV, mutations, perturbations over huge collection of cell lines The Connectivity Map (CMAP) The Connectivity Map (also known as cmap) is a collection of genome-wide transcriptional expression data from cultured human cells treated with bioactive small molecules and simple pattern-matching algorithms that together enable the discovery of functional connections between drugs, genes and diseases through the transitory feature of common gene-expression changes. You can learn more about cmap from our papers in Science and Nature Reviews Cancer. DepMap: The Cancer Dependency Map Project The goal of the Cancer Dependency Map Project is to systematically catalog and identify biomarkers of genetic vulnerabilities and drug sensitivities in hundreds of cancer models and tumors, to accelerate the development of precision treatments. Read more about it here. Library of Integrated Network-based Cellular Signatures (LINCS) Data collection and analysis of molecular signatures that describe how different types of cells respond to a variety of perturbing agents Genomic of drug sensitivity in cancer Mutation, CNV, Affy expression and drug sensitivity in ~300 cancer cell-lines Papers: http://nar.oxfordjournals.org/content/41/D1/D955.long , http://www.nature.com/nature/journal/v483/n7391/full/nature11005.html The Drug Gene Interaction database (DGIdb) Molecular Libraries Program (MLP) Access to the large-scale screening capacity necessary to identify small molecules that can be optimized as chemical probes to study the functions of genes, cells, and biochemical pathways in health and disease Allen Brain Atlas Data collection and an online public resources integrating extensive gene expression and neuroanatomical data for human and mouse, including variation of mosue gene expression by strain. BrainCloud BrainCloud is a freely-available, biologist-friendly, stand-alone application for exploring the temporal dynamics and genetic control of transcription in the human prefrontal cortex across the lifespan. BrainCloud was developed through collaboration between the Lieber Institute and NIMHThe Human Connectome Project Data collection and integration to create a complete map of the structural and functional neural connections, within and across individuals Geuvadis RNA sequencing project of 1000 Genomes samples mRNA and small RNA sequencing on 465 lymphoblastoid cell line (LCL) samples from 5 populations of the 1000 Genomes Project: the CEPH (CEU), Finns (FIN), British (GBR), Toscani (TSI) and Yoruba (YRI). The Achilles Project Project Achilles is a systematic effort aimed at identifying and cataloging genetic vulnerabilities across hundreds of genomically characterized cancer cell lines. The project uses a genome-wide shRNA library to silence individual genes and identify those genes that affect cell survival. Large-scale functional screening of cancer cell lines provides a complementary approach to those studies that aim to characterize the molecular alterations (mutations, copy number alterations, etc.) of primary tumors, such as The Cancer Genome Atlas. The overall goal of the project is to link cancer genetic dependencies to their molecular characteristics in order to Identify molecular targets and guide therapeutic development. Broad Institute's Single cell Portal The Single-Cell Portal was developed to facilitate sharing scientific results, and disseminating data generated from single cell technologies Human Cell Atlas The HCA Data Portal stores and provides single-cell data contributed by labs around the world. Anyone can contribute data, find data, or access community tools and applications Phenotypes and diseaseHuman Ageing Genomic Resources The Cancer Genome Atlas (TCGA) Data collection and a data repository, including cancer genome sequence data International Cancer Genome Consortium (ICGC) Data collection and a data repository for a comprehensive description of genomic, transcriptomic and epigenomic changes of cancer Genotype-Tissue Expression (GTEx) Project Data collection, data repository, and sample bank for human gene expression and regulation in multiple tissues, compared to genetic variation Knockout Mouse Phenotyping Program (KOMP2) Data collection for standardized phenotyping of a genome-wide collection of mouse knockouts Database of Genotypes and Phenotypes (dbGaP) Data repository for results from studies investigating the interaction of genotype and phenotype NHGRI Catalog of Published GWAS Public catalog of published Genome-Wide Association Studies Clinical Genomic Database A manually curated database of conditions with known genetic causes, focusing on medically significant genetic data with available interventions. NHGRI's Breast Cancer information core Breast Cancer Mutation database ClinVar ClinVar is designed to provide a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar collects reports of variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in submissions are mapped to reference sequences, and reported according to the HGVS standard. ClinVar then presents the data for interactive users as well as those wishing to use ClinVar in daily workflows and other local applications. ClinVar works in collaboration with interested organizations to meet the needs of the medical genetics community as efficiently and effectively as possible. Human Gene Mutation Database (HGMD) The Human Gene Mutation Database (HGMD®) represents an attempt to collate known (published) gene lesions responsible for human inherited disease NHLBI Exome Sequencing Project (ESP) Exome Variant Server The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders. Genetics Home Reference Genetics Home Reference is the National Library of Medicine's web site for consumer information about genetic conditions and the genes or chromosomes related to those conditions. GeneReviews GeneReviews are expert-authored, peer-reviewed disease descriptions presented in a standardized format and focused on clinically relevant and medically actionable information on the diagnosis, management, and genetic counseling of patients and families with specific inherited conditions. Global Alzheimer's Association Interactive Network (GAAIN) The Global Alzheimer’s Association Interactive Network (GAAIN) is a collaborative project that will provide researchers around the globe with access to a vast repository of Alzheimer’s disease research data and the sophisticated analytical tools and computational power needed to work with that data. Our goal is to transform the way scientists work together to answer key questions related to understanding the causes, diagnosis, treatment and prevention of Alzheimer’s and other neurodegenerative diseases. In 2013, obtained WGS data for the largest cohort of 800 Alzheimer's patients The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium was formed to facilitate genome-wide association study meta-analyses and replication opportunities among multiple large and well-phenotyped longitudinal cohort studies. They also have DNA methylation data alongside WGS and Exome Seq. The NIMH Center for Collaborative Genomic Studies on Mental Disorders (Include Psychiatric Disease Consortium https://pgc.unc.edu/) The NIMH Center, now known as NIMH Repository and Genomics Resource (NIMH-RGR) plays a key role in facilitating psychiatric genetic research by providing a collection of over 150,000 well characterized, high quality patient and control samples from a wide-range of mental disorders. Data integrationUCSC Genome Bioinformatics Genome databases displayed through a genome browser for vertebrates, other eukaryotes, and prokaryotes, including sequence conservation, transcript maps and expression, functional annotation, genetic variation, and human disease information Ensembl Genome databases displayed through a genome browser for vertebrates and other eukaryotic species, including sequence conservation, transcript maps and expression, functional annotation, genetic variation, and human disease information Reactome Pathway database: open-source, open access, manually curated and peer-reviewed Molecular Signatures Database (MSigDB) MSigDB is a collection of annotated gene sets for use with Gene Set Enrichment (GSEA) software KEGG: Kyoto Encyclopedia of Genes and Genomes Database of pathways, diseases, drugs BIOCARTA Pathway analysis resource Genomatix Proprietary genome annotation and pathway analysis software GOLD:Genomes Online Database Information regarding genome and metagenome sequencing projects, and their associated metadata, around the world ImmPort: Immunology Database and Analysis Portal The ImmPort system provides advanced information technology support in the production, analysis, archiving, and exchange of scientific data for the diverse community of life science researchers supported by NIAID/DAIT. It serves as a long-term, sustainable archive of data generated by investigators funded through the NIAID/DAIT. The core component of the ImmPort system is an extensive data warehouse containing an integration of experimental data supplied by NIAID/DAIT-funded investigators and genomic, proteomic, and other data relevant to the research of these programs extracted from a variety of public databases. The ImmPort system also provides data analysis tools and an immunology-focused ontology. Model organism databases (selected examples)Mouse Genome Informatics Includes genotypes with phenotype annotations, human diseases with one or more mouse models, expression assays and images, pathways, and refSNPs, Rat Genome Database (RGD) Repository of rat genetic and genomic data, as well as mapping, strain, and physiological information FlyBase A Database of Drosophila Genes & Genomes WormBase The genetics, genomics and biology of C. elegans and related nematodes The Zebrafish Model Organism Database (ZFIN) Support integrated zebrafish genetic, genomic and developmental information XenBase Xenopus laevis and Xenopus tropicalis biology and genomics resource Saccharomyces Genome Database (SGD) Integrated biological information for budding yeast, along with search and analysis tools |
Datasets
Dataset | Description | URL | Type |
---|---|---|---|
Imgen/DMAP RNA data in human and mouse | 38 human samples and ~240 mouse samples | http://www.pnas.org.libproxy.mit.edu/content/110/8/2946.full | Biology, RNA |
A Catalog of Published Genome-Wide Association Studies | GWAS catalog | http://www.genome.gov/26525384 | Biology |
MicroRNA expression profiles for the NCI-60 cancer cell panel | NCI60 microRNA dataset | http://discover.nci.nih.gov/host/2007_microrna_abstract.jsp | Biology |
A Gene Expression Database for the Molecular Pharmacology of Cancer | NCI60 datasets measure mRNA, protein expression, miRNA expression, chromosomal abberations and drug response in 60 diverse cancer cell lines | http://discover.nci.nih.gov/nature2000/natureintromain.jsp | Biology |
Showing 4 items