I'm reading Yang's computational molecular evolution and in the very first chapter it says the P distance is a "simplistic distance measure" between two sequences. It is the number of different sites / the total number of sites.
However, because back mutation and multiple hits cause inaccuracy it become weak over longer distances (large number of generations to most recent common ancestor, MRCA, I guess).
"The raw proportion p is usable only for highly similar sequences, with p <0.05 (5%, approximately)."
How far back in divergence time do we have to go for this p<5% to be violated? I assume it has to do with the mutation rate right? I'm trying to get a sense of scale on this for myself. Are we talking a distance of just a few generations (within species), the distance between two closely related species, or an even greater distance?