1

I'm reading Yang's computational molecular evolution and in the very first chapter it says the P distance is a "simplistic distance measure" between two sequences. It is the number of different sites / the total number of sites.

However, because back mutation and multiple hits cause inaccuracy it become weak over longer distances (large number of generations to most recent common ancestor, MRCA, I guess).

"The raw proportion p is usable only for highly similar sequences, with p <0.05 (5%, approximately)."

How far back in divergence time do we have to go for this p<5% to be violated? I assume it has to do with the mutation rate right? I'm trying to get a sense of scale on this for myself. Are we talking a distance of just a few generations (within species), the distance between two closely related species, or an even greater distance?

rg255
  • 16,072
  • 4
  • 66
  • 104

0 Answers0