2

I clarified my question, by editing the title. Using the comments and links below, I have answered it myself:

The answer to my question is: The similarity is between 98.4%-99.9%, depending on which type of differences you include. The numbers relate to writing my and the other person's DNA as a long string, and compare differences (change of letter, missing/inserted letter). The most important differences seem to be SNPs (=difference in a letter in a gene-coding sequence) and account for 0.1% of difference between our 2 DNA's:

I have striked out the original question text, because it includes some motivation and interpretation, which seems false.

edit: It was suggested that the question is a duplicate to "Same" DNA vs genes but this might seem not to be the case, as this question just concerns the used metric (compare answer below).

99.9% vs 88%:

2 person's DNA is 99.9% the same (which goes down to 99% in some sources), e.g. http://book.bionumbers.org/how-genetically-similar-are-two-random-people/

Does this number include differences in copy-number variations and maybe other ?

On the other hand https://www.nature.com/scitable/content/global-variation-in-copy-number-in-the-13571 says that there may be up to 12% of some other differences (I am not sure what those differences mean).

Question:

I have been pouring through the internet and books, but have not yet understood to which metric this 99.9% number refers to? (I am a pure mathematician)

My hypothesis what it means:

If we look at a 1000 base-pair long snippet of the DNA of person A, then we will find this same snippet somewhere also in person B's DNA, with just one difference in one of the 1000 base pairs.

Now, if the latter is true, then this means that my DNA may still be significantly different than my neighbours, e.g. by containing a different number of repetitions in the non-coding DNA.

Ultimately, my DNA could be only, say, 2% similar instead of 99.9% (depending on the number of different repetitions), if we used as a metric the number of single base changes (including single deletions/insertions), we need to do to go from the DNA of person A to the DNA of person B (assuming we wrote down each other's DNA in one long string). Question 2: In fact, I would be quite interested to know how much the DNA of two people is similar using this last metric?

tyrex
  • 121
  • 3
  • 2
    Please quote the exact passage in the Nature paper which you think says that there may be 12% of other changes. I have looked at the paper, searched for "12%", and all I can find is the statement " A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations." If you think that this meanss that individuals differ by 12% you are mistaken and your question is based on a false premise and should therefore be withdrawn. – David Apr 29 '18 at 17:43
  • 1
    @David Thanks for the comment. My question is fully independent of the 12% number, so withdrawal may be a bit too much. I use the number in my motivation. You may be correct that I misunderstood, because the sentence you mention is the one I am referring to. Compare also the BBC-article http://news.bbc.co.uk/2/hi/science/nature/6174510.stm - my misunderstanding possibly reinforces the need for an answer to my question: Which metric are we talking about? If you have some explanations or links/books, that can give me a better understanding, I am happy to learn as I have little clue right now :) – tyrex Apr 29 '18 at 21:33
  • The 99.9% metric is quite straightforward — a simple base for base comparison between two DNAs — and is dealt with in my answer to this question. If you want to analyse something different like gene duplications (as in the paper you cite) you need to understand the biology and devise your own metric. Any paper with such in it will have it defined (possibly in the supplementary material. – David Apr 30 '18 at 09:56

0 Answers0