[ensembl-dev] Question regarding transcript location data
ayates at ebi.ac.uk
Wed Jun 20 11:54:41 BST 2012
We have only been able to map the NM number NM_001164277 to the transcript ENST00000577093 (from gene ENSG00000262676) in the patch region as this was a better match than any transcript held by ENSG00000137700 which is the equivalent gene on chromosome 11. You can see this from the following alignments:
* Shows the alignment between NM_001164277.1 and ENST00000577093
* Shows a mis-match between chr11 & NM_001164277.1 in the 4th HSP
* Shows ENST00000577093 as the top hit with 100% identity in a blast of all available cDNAs
If you want to go after the genes on the non-patched regions then I can suggest to alternatives. The first would be to project the coordinates of the patch gene back to the reference and look for an overlapping gene & assuming this is the same (could be a bit dodgy to do this). Otherwise I would BLAST Ensembl's cDNAs of your mis-hitting NMs and only taking the top hit which is on a reference chromosome. The API can tell you this if you get a slice & call $slice->is_reference().
Hope this helps,
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
On 20 Jun 2012, at 09:48, Duarte Molha wrote:
> In some on my transcript IDS, they have a perfectly correct genomic position in UCSC
> As an example, the NM ID - NM_001164277
> Has a valid genomic location in UCSC:
> and in ENSEMBL there is no genomic position info... only the patch ID
> So, what ideally what I would need is a way of converting the patch ID into a valid chromosome number.
> IS there an easy way of doing this?
> Best regards
> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Duarte Molha
> Sent: 20 June 2012 09:20
> To: Ensembl developers list
> Subject: [ensembl-dev] Question regarding transcript location data
> Dear developers
> I have a script that retrieves intron and exon information from an input NM ID. It is working pretty well but I have stumbled on a few NM iDS and I would like to know what I can to correct the behaviour of the script.
> As an example if I input this ID:
> My script ouputs:
> INPUT_ID CHR start end ENST_ID EXON_ID strand
> NM_173471 chrHG991_PATCH 66119285 66119659 ENST00000566782 ENSE00002619671 0 +1
> NM_173471 chrHG991_PATCH 66298434 66298819 ENST00000566782 ENSE00002619173 0 +1
> ... <abbreviated>
> However what I would have liked to have been outputted was the correct genomic location of this transcript... in this case chr3.
> Can you tell me how I can change by script so that it gets the correct chr location instead of this PATCH ids?
> Basically I would have wanted the script to have outputted the exon/intron data from the first entry on this link:
> Best regards,
> Duarte Molha
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev