Issues with STR-Based Calculations of Time to an MRCA Shared by Two Men

[This page, first posted on this website in 2014, is posted here as a matter of historical interest.]

Michal Milewski's SNP-based calculations of time to a Most Recent Common Ancestor ("MRCA") for clusters of R1a1a Ashkenazi Levites are inconsistent with the STR-based calculations of time to an MRCA shared by two men as predicted by Family Tree DNA's TiP reports and the Y-Utility (which allows the calculation of an estimated time to an MRCA based upon varying mutation rates and probabilities).

In some cases, the TiP reports and the Y-Utility predict that two men share an MRCA within perhaps the past 300 to 400 years, when the SNP-based calculations show that the MRCA must have lived 1,000 years ago or more. These discrepancies exist even when the TiP reports predict that the likelihood of a shared ancestor within a specified number of generations is 95% or more, and even when the Y-Utility's calculations are based on a 95% probability.

This discrepancy appears to reflect the facts that: (1) all R1a1a Ashkenazi Levites share an MRCA within the past 1,500 years or so; (2) the TiP reports and the Y-Utility are heavily dependent upon the number of deviations between two men on marker values and the markers on which they deviate (deviations on slow-mutating markers apparently are interpreted as indicating a longer time to an MRCA than are deviations on fast-mutating markers); and (3) because Y-DNA STR mutations in a given man occur randomly, some men's lines will have disproportionately few mutations (resulting in an underprediction of time to an MRCA for such men, especially when compared to each other).

To take an oversimplified example, assume that: (1) there are four men, A, B, C, and D, who share an MRCA who lived 1,000 years ago; and (2) over the past 1,000 years A's line and B's line have each had six mutations on the 111 markers tested by FTDNA (on different markers) but C's line and D's line have had no mutations.

A and B would be 12 steps from each other, but they would each be only six steps from C and D. As a result, a TiP report or the Y-Utility would report A and B as being far more closely related to C and D than they are to each other, even though their MRCA lived at the same time.

Similarly, the TiP report or the Y-Utility would report C and D as sharing a very recent MRCA, even though their shared marker values reflect a lack of mutations in their respective lines over a long period as opposed to a very recent shared common ancestor. For men who are very close to the mode, it will be difficult if not impossible, in the absence of shared deviations from the R1a1a Ashkenazi Levite modal haplotype, to determine whether an apparently close match reflects an actual close match or a relative lack of mutations in two lines.

It is possible that the discrepancy between SNP-predicted and STR-predicted times to an MRCA reflects the peculiar nature of the R1a1a Ashkenazi Levite population, with its multiple bottlenecks and relatively recent MRCA, rather than anything inherent in the use of STR marker values to calculate time to an MRCA.

Note that this dating problem results from the fact that, where there is a very small sample size (the sample size is two when calculating the time to an MRCA between two men), the random nature of STR mutations is likely to have a disproportionate effect upon calculations. It is possible, by considering more sets of marker values, to minimize the likelihood that calculations will be skewed by a disproportionate number of mutations. For this reason, the analysis set forth on this page does not call into question the STR-based calculations of time to an MRCA for R1a1a Ashkenazi Levites as a whole.