Index of Coincidence

The Index of Coincidence (IC) is a method of determining if cipher text was created using a single alphabet (monoalphabetic such as Simple Substitution) or multiple alphabets (polyalphabetic such as Vigenère, Beaufort, Porta etc.). If the cipher was created using multiple alphabets it can indicate the number of alphabets used or the cipher period.

It was developed by the U.S. Government Cryptographer William F. Friedman (1891-1969) in the 1920s and the technique was declassified and published by the author in the National Security Agency in the publication Military Cryptanalytics, Part I in 1956. It is the probability that a randomly chosen pair of letters in the message is equal. English text has a value of around 0.0667 and the number of alphabets which has a value statistically closest to this figure is likely to be the correct cipher period.

The formula to calculate the Index of Coincidence is:

where fi is the frequency of each letter A to Z and N is the length of the cipher text.

The IC is normalized by dividing the N(N - 1) by 26 or the number of letters in the alphabet.

If the Index of Coincidence (IC) of some unknown English text is calculated to be around 0.067 it suggests the cipher text is monoalphabetic. However if instead of counting the frequency of every letter but only counting every 2nd letter the IC of a period 2 cipher is found. Similarly counting every 3rd letter shows the IC of a period 3 cipher and so on. The IC closest to 0.067 is likely to be the number of different alphabets used, or its period. This is how the IC can be used to check if a cipher is polyalphabetic and, if it is, determine its period.

CryptoCrack calculates the IC for between 1 and 20 alphabets and displays the results in a table. The most likely period for the given cipher will be highlighted in red though this could be a multiple of the correct period.