I have an annotation from gencode.vM1.annotation of a mouse. Exons, genes, transcripts and UTR are listed there. For each feature a start and end positions are present.
I need to find a TSS for each gene. Will it be wrong to define one TSS for all transcripts belonging to one gene? So, I want to define a TSS for each gene as a minimum start UTR from all transcripts.
Example:
gene id, transcript id, feature, start, end
gene A, transcript A1, UTR, 102, 205
gene A, transcript A1, UTR, 506, 520
gene A, transcript A2, UTR, 78, 205
gene A, transcript A2, UTR, 502, 512
So, for gene A I would define a TSS on position 78 (minimum start UTR). And the gene body end on the position 520 (maximum end UTR).