[ensembl-dev] Difference in genomic coordinates between REFSEQ and ENSEMBL

Andy Yates ayates at ebi.ac.uk
Mon Feb 24 12:34:30 GMT 2014


Hi Duarte,

Not to worry I'm glad what you expect from us (WRT lowest & highest) is what we are doing. As for Ensembl and RefSeq transcripts the two sets of models are separate database entities. Ensembl gene coordinates make no attempt to bound RefSeq transcripts. So to answer your question yes you should take into account both Ensembl genes and RefSeq when calculating your bounding window.

Andy

On 24 Feb 2014, at 10:08, Duarte Molha <duartemolha at gmail.com> wrote:

> I understand the difference in the definition. I probably failed to explain my own understanding very well. 
> 
> Yes... Your definition is what I agree with... so in layman's terms the start of the gene coordinate would be the most upstream start of any transcript (even if that transcript is not the biggest) and the end coordinate would be the most downstream coordinate of any of the transcripts found, again even if that transcript is not the largest in the set)
> 
> This is the kind of definition I would like to have...
> so that any refseq transcript of that gene should always be contained within the ENSG coordinates for that gene correct?
> 
> In this case it is not valid. 
> So here is my question reformulated:
> Can I not rely on the idea that the ENSEMBL gene coordinates will always encompass any refseq transcript for the gene of interest? 
> In this case and in many other in my dataset it appears I cannot. And I have many other examples if this in my dataset.
> 
> Best regards
> 
> Duarte
>  
> 
> =========================
>      Duarte Miguel Paulo Molha      
>          http://about.me/duarte         
> =========================
> 
> 
> On Mon, Feb 24, 2014 at 9:55 AM, Andy Yates <ayates at ebi.ac.uk> wrote:
> Hi Duarte,
> 
> Just to clarify one mis-conception here. Ensembl gene coordinates are the minimum start and maximum end of any transcript from the set linked to a gene (the coordinates which bound all transcripts). A gene's coordinates are not the same as its longest transcript model.
> 
> That doesn't explain the discrepancy you've seen between NM_001101426.3 and ENST00000407010. I can see from http://www.ensembl.org/Homo_sapiens/Share/17e6832cf57be0231caa268e919b3da4126347817 that this is caused by a longer 3' UTR in the RefSeq model. I do not know why that's the case. Hopefully someone else on the list will have a better idea.
> 
> Andy
> 
> On 24 Feb 2014, at 09:09, Duarte Molha <duartemolha at gmail.com> wrote:
> 
> > Dear Developers…
> >
> >
> > I was wondering if anyone of you could help me with an problem I am having comparing REFSEQ with ENSEMBL transcripts…
> >
> > I had assumed that the gene start and end coordinates in ensembl were obtained from the longest transcript model for each gene. However this does not seem to be the case when comparing as list of around 300 genes I have queried
> >
> >
> > Take a look at the example for transcript NM_001101426. In refseq this transcript has the coordinates: chr7:16127152-16460947. However if you search for it in Ensembl you get the transcript ENST00000407010 with the coordinates : chr7:16130817-16460947
> >
> > If we assume that ensembl would use the longest running transcript to determine the start and end coordinates then the ISPD gene should start at 16127152 and not at 16130817. There is a difference of almost 4KB. I understand the gene models are different and I would expect small differences between the two… but not a 4KB diference. Can you explain the discrepancy?
> > Best regards
> > Duarte
> >
> > =========================
> >      Duarte Miguel Paulo Molha
> >          http://about.me/duarte
> > =========================
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list