I'm looking at a genome sequence for 2019-nCoV on NCBI. The FASTA sequence looks like this:
>MN988713.1 Wuhan seafood market pneumonia virus isolate 2019-nCoV/USA-IL1/2020, complete genome
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA
CGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAAC
TAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTG
...
...
TTAATCAGTGTGTAACATTAGGGAGGACTTGAAAGAGCCACCACATTTTCACCGAGGCCACGCGGAGTAC
GATCGAGTGTACAGTGAACAATGCTAGGGAGAGCTGCCTATATGGAAGAGCCCTAATGTGTAAAATTAAT
TTTAGTAGTGCTATCCCCATGTGATTTTAATAGCTTCTTAGGAGAATGACAAAAAAAAAAAA
Coronavirus is an RNA virus, so I was expecting the sequence to consist of AUGC characters. But the letters here are ATGC, which looks like DNA!
I found a possible answer, that this is the sequence of a "complementary DNA". I read that
The term cDNA is also used, typically in a bioinformatics context, to refer to an mRNA transcript's sequence, expressed as DNA bases (GCAT) rather than RNA bases (GCAU).
However, I don't believe this theory that I'm looking at a cDNA. If this were true, the end of the true mRNA sequence would be ...UCUUACUGUUUUUUUUUUUU, or a "poly(U)" tail. But I believe the coronavirus has a poly(A) tail.
I also found that the start of all highlighted genes begin with the sequence ATG. This is the DNA equivalent of the RNA start codon AUG.
So, I believe what I'm looking at is the true mRNA, in 5'→3' direction, but with all U converted to T.
So, is this really what I'm looking at? Is this some formatting/representation issue? Or does 2019-nCoV really contain DNA, rather than RNA?
AAA(3' poly(A) tail) at the end of that sequence. Am I confusing multiple formats here or is your sequence missing a part? Is this the result of transcribing RNA as DNA? – Mast Feb 10 '20 at 07:48