[ensembl-dev] Question regarding transcript location data

Wed Jun 20 11:54:41 BST 2012

Hi Duarte,

We have only been able to map the NM number NM_001164277 to the transcript ENST00000577093 (from gene ENSG00000262676) in the patch region as this was a better match than any transcript held by ENSG00000137700 which is the equivalent gene on chromosome 11. You can see this from the following alignments:

* Shows the alignment between NM_001164277.1 and ENST00000577093

http://www.ensembl.org/Homo_sapiens/Transcript/Similarity/Align?db=core;extdb=refseq_mrna;g=ENSG00000262676;r=HG299_PATCH:118895063-118901615;sequence=NM_001164277.1;t=ENST00000577093

* Shows a mis-match between chr11 & NM_001164277.1 in the 4th HSP

http://www.ensembl.org/Multi/blastview/BLA_NjAQP0GdI

* Shows ENST00000577093 as the top hit with 100% identity in a blast of all available cDNAs 

http://www.ensembl.org/Multi/blastview?format=raw;ticket=BLA_YUG3Q7QWa;runnable=beBzEMtz;result=5618!!20120620

If you want to go after the genes on the non-patched regions then I can suggest to alternatives. The first would be to project the coordinates of the patch gene back to the reference and look for an overlapping gene & assuming this is the same (could be a bit dodgy to do this). Otherwise I would BLAST Ensembl's cDNAs of your mis-hitting NMs and only taking the top hit which is on a reference chromosome. The API can tell you this if you get a slice & call $slice->is_reference().

Hope this helps,

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 20 Jun 2012, at 09:48, Duarte Molha wrote:

> In some on my transcript IDS, they have a perfectly correct genomic position in UCSC
>  
> As an example, the NM ID - NM_001164277
>  
> Has a valid genomic location in UCSC:
>  
> http://genome.ucsc.edu/cgi-bin/hgTracks?hgHubConnect.destUrl=..%2Fcgi-bin%2FhgTracks&clade=mammal&org=Human&db=hg19&position=NM_001164277&hgt.suggest=&hgt.suggestTrack=knownGene&Submit=submit&hgsid=279134705
>  
> and in ENSEMBL there is no genomic position info... only the patch ID
>  
> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=1;q=NM_001164277
>  
> So, what ideally what I would need is a way of converting the patch ID into a valid chromosome number.
>  
> IS there an easy way of doing this?
>  
> Best regards
>  
> Duarte
>  
>  
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Duarte Molha
> Sent: 20 June 2012 09:20
> To: Ensembl developers list
> Subject: [ensembl-dev] Question regarding transcript location data
>  
> Dear developers
>  
> I have a script that retrieves intron and exon information from an input NM ID. It is working pretty well but I have stumbled on a few NM iDS and I would like to know what I can to correct the behaviour of the script.
>  
> As an example if I input this ID:
>  
> NM_173471
>  
> My script ouputs:
>  
> INPUT_ID            CHR                                        start                       end                        ENST_ID                               EXON_ID                             strand  
> NM_173471       chrHG991_PATCH              66119285            66119659             ENST00000566782            ENSE00002619671 0         +1
> NM_173471       chrHG991_PATCH              66298434            66298819             ENST00000566782            ENSE00002619173 0         +1
> ... <abbreviated>
>  
> However what I would have liked to have been outputted was the correct genomic location of this transcript... in this case chr3.
> Can you tell me how I can change by script so that it gets the correct chr location instead of this PATCH ids?
>  
> Basically I would have wanted the script to have outputted the exon/intron data from the first entry on this link:
>  
> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Transcript;end=2;q=NM_173471
>  
>  
> Best regards,
>  
> Duarte Molha
>  
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/