[ensembl-dev] Different UTRs for the same transcript?

Mon Mar 14 16:10:02 GMT 2011

Hi Holger
When you query the UTR information from mart, what you are actually  
getting is one result row per exon. Therefore if you add the exon_id  
to your attributes you will see that the results make more sense as  
you get a unique row per exon in each transcript. I hope that makes  
sense, but please get back to me if you have further questions.
Regards
Rhoda

On 14 Mar 2011, at 14:42, Holger Brandl wrote:

> Hello,
>
> I'm using BIomart to access Ensembl. I'm interested in UTR regions,  
> so I'm using the following query:
> mart = useDataset("mmusculus_gene_ensembl", mart = useMart("ensembl"))
> utrInfos <- getBM(attributes=c('ensembl_gene_id',  
> 'ensembl_transcript_id',  
> '5_utr_start 
> ','5_utr_end 
> ','3_utr_end 
> ','3_utr_start 
> ','start_position 
> ','end_position','transcript_start','transcript_end'),  
> filters=c('ensembl_gene_id'),  
> values=c('ENSMUSG00000018733'),mart=mart);
>
> However the result of this query seems to have a weired structure as  
> it contains 4 rows for each transcript, from which two contain  
> different 5' utr boundaries.
> In contrast, what I would expect is a single row for each transcript  
> with 5' AND 3' utr information (if available).
> I've tried the same query for other genes, but the the results  
> always have a similar structure.
>
> The same happens if I run my query through the webinterface. Here's  
> the URL for the above mentioned example:
>  http://www.ensembl.org/biomart/martview/f20886e0142055da2a2b0a9de30d5ca8/f20886e0142055da2a2b0a9de30d5ca8/f20886e0142055da2a2b0a9de30d5ca8?VIRTUALSCHEMANAME=default&ATTRIBUTES=mmusculus_gene_ensembl.default.structure.ensembl_gene_id 
> |mmusculus_gene_ensembl.default.structure.ensembl_transcript_id| 
> mmusculus_gene_ensembl.default.structure.5_utr_start| 
> mmusculus_gene_ensembl.default.structure.5_utr_end| 
> mmusculus_gene_ensembl.default.structure.3_utr_end| 
> mmusculus_gene_ensembl.default.structure.3_utr_start| 
> mmusculus_gene_ensembl.default.structure.start_position| 
> mmusculus_gene_ensembl.default.structure.end_position| 
> mmusculus_gene_ensembl.default.structure.transcript_start| 
> mmusculus_gene_ensembl
>
>
> .default 
> .structure 
> .transcript_end 
> &FILTERS 
> = 
> mmusculus_gene_ensembl 
> .default 
> .filters 
> .ensembl_gene_id."ENSMUSG00000018733"&VISIBLEPANEL=resultspanel
> Do you have any ideas what the problem with my query could be?
>
> Best,
> Holger Brandl
> -- 
> Dr. Holger Brandl
> Bioinformatics Service
> Max Planck Institute of Molecular Cell Biology and Genetics
> Pfotenhauerstrasse 108
> 01307 Dresden, Germany
>
> Tel.:   +49/351/210-2738
> Fax:    +49 351 210 2000
> www:  http://www.mpi-cbg.de
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110314/f3cfd6ec/attachment.html>