The IGN Corpus

The Instance-level Gene Normalization (IGN) corpus was compiled using two datasets, one for abstract and the other for full text-level evaluations. For each article, in addition to the annotations of all described gene/gene product mentions, the following annotations are included in IGN corpus:

  1. The corresponding Entrez Gene ID of each human gene mention,

  2. Species information of each gene mention,

  3. Gene full name/abbreviation pairs,

  4. Co-reference of gene mentions, and

  5. Sentence boundaries.

The IGN corpus intends to be released in the BioC XML format as a publicly available resources for other text mining systems.