The IGN Corpus

The Instance-level Gene Normalization (IGN) corpus was compiled using two datasets, one for abstract and the other for full text-level evaluations. For each article, in addition to the annotations of all described gene/gene product mentions, the following annotations are included in IGN corpus:

The corresponding Entrez Gene ID of each human gene mention,
Species information of each gene mention,
Gene full name/abbreviation pairs,
Co-reference of gene mentions, and
Sentence boundaries.

The IGN corpus intends to be released in the BioC XML format as a publicly available resources for other text mining systems.