[ensembl-dev] Question regarding transcript location data
Andy Yates
ayates at ebi.ac.uk
Wed Jun 20 11:54:41 BST 2012
Hi Duarte,
We have only been able to map the NM number NM_001164277 to the transcript ENST00000577093 (from gene ENSG00000262676) in the patch region as this was a better match than any transcript held by ENSG00000137700 which is the equivalent gene on chromosome 11. You can see this from the following alignments:
* Shows the alignment between NM_001164277.1 and ENST00000577093
http://www.ensembl.org/Homo_sapiens/Transcript/Similarity/Align?db=core;extdb=refseq_mrna;g=ENSG00000262676;r=HG299_PATCH:118895063-118901615;sequence=NM_001164277.1;t=ENST00000577093
* Shows a mis-match between chr11 & NM_001164277.1 in the 4th HSP
http://www.ensembl.org/Multi/blastview/BLA_NjAQP0GdI
* Shows ENST00000577093 as the top hit with 100% identity in a blast of all available cDNAs
http://www.ensembl.org/Multi/blastview?format=raw;ticket=BLA_YUG3Q7QWa;runnable=beBzEMtz;result=5618!!20120620
If you want to go after the genes on the non-patched regions then I can suggest to alternatives. The first would be to project the coordinates of the patch gene back to the reference and look for an overlapping gene & assuming this is the same (could be a bit dodgy to do this). Otherwise I would BLAST Ensembl's cDNAs of your mis-hitting NMs and only taking the top hit which is on a reference chromosome. The API can tell you this if you get a slice & call $slice->is_reference().
Hope this helps,
Andy
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
On 20 Jun 2012, at 09:48, Duarte Molha wrote:
> In some on my transcript IDS, they have a perfectly correct genomic position in UCSC
>
> As an example, the NM ID - NM_001164277
>
> Has a valid genomic location in UCSC:
>
> http://genome.ucsc.edu/cgi-bin/hgTracks?hgHubConnect.destUrl=..%2Fcgi-bin%2FhgTracks&clade=mammal&org=Human&db=hg19&position=NM_001164277&hgt.suggest=&hgt.suggestTrack=knownGene&Submit=submit&hgsid=279134705
>
> and in ENSEMBL there is no genomic position info... only the patch ID
>
> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=1;q=NM_001164277
>
> So, what ideally what I would need is a way of converting the patch ID into a valid chromosome number.
>
> IS there an easy way of doing this?
>
> Best regards
>
> Duarte
>
>
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Duarte Molha
> Sent: 20 June 2012 09:20
> To: Ensembl developers list
> Subject: [ensembl-dev] Question regarding transcript location data
>
> Dear developers
>
> I have a script that retrieves intron and exon information from an input NM ID. It is working pretty well but I have stumbled on a few NM iDS and I would like to know what I can to correct the behaviour of the script.
>
> As an example if I input this ID:
>
> NM_173471
>
> My script ouputs:
>
> INPUT_ID CHR start end ENST_ID EXON_ID strand
> NM_173471 chrHG991_PATCH 66119285 66119659 ENST00000566782 ENSE00002619671 0 +1
> NM_173471 chrHG991_PATCH 66298434 66298819 ENST00000566782 ENSE00002619173 0 +1
> ... <abbreviated>
>
> However what I would have liked to have been outputted was the correct genomic location of this transcript... in this case chr3.
> Can you tell me how I can change by script so that it gets the correct chr location instead of this PATCH ids?
>
> Basically I would have wanted the script to have outputted the exon/intron data from the first entry on this link:
>
> http://www.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Transcript;end=2;q=NM_173471
>
>
> Best regards,
>
> Duarte Molha
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list