[ensembl-dev] Biomart inconsistency

Rhoda Kinsella rhoda at ebi.ac.uk
Wed Aug 1 08:59:49 BST 2012


Hi Ivan
This is a well known issue with BioMart which we have fed back to the BioMart developers. If you take a look a this gene on the Ensembl website you will understand what is happening here:

http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000072110;r=14:69340860-69446157


You will see that this gene has 21 transcripts. This corresponds to your first query in BioMart. When you add the attribute for UniProt accession, this essentially acts like a filter as you only retrieve transcripts that are protein coding (there are 14 protein coding transcripts which corresponds to your second query in BioMart). The tools we use to build the gene mart will not currently allow us to add a necessary left join to allow retrieval of all transcripts in the results whether protein coding or not. The reason you don't see this issue in the NCBI36 BioMart is because there were only 2 protein coding transcripts for this gene in in 2009 so you will retrieve everything. 

http://may2009.archive.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000072110;r=14:69340860-69446157

I hope this makes sense but please don't hesitate to get back to me if you require more information.
Regards
Rhoda


On 31 Jul 2012, at 20:24, Ivan Kel wrote:

> Greetings, 
> 
> I am using Ensembl Biomart to map Ensembl Gene IDs to Transcript IDs and UniProt/SwissProt Accession numbers. 
> Surprisingly, in several cases the corresponding Transcript IDs found for a Gene ID deffer depending on whether or not I add the UniProt number to the search. 
> To clarify here is an example:
> Ensembl Gene ID: ENSG00000072110
> Result using only GeneID and TranscriptID: 
> Ensembl Gene ID	Ensembl Transcript ID
> ENSG00000072110	ENST00000193403
> ENSG00000072110	ENST00000556083
> ENSG00000072110	ENST00000553882
> ENSG00000072110	ENST00000394419
> ENSG00000072110	ENST00000438964
> ENSG00000072110	ENST00000376839
> ENSG00000072110	ENST00000555075
> ENSG00000072110	ENST00000538545
> ENSG00000072110	ENST00000544964
> ENSG00000072110	ENST00000553290
> ENSG00000072110	ENST00000556432
> ENSG00000072110	ENST00000556343
> ENSG00000072110	ENST00000555616
> ENSG00000072110	ENST00000556433
> ENSG00000072110	ENST00000554508
> ENSG00000072110	ENST00000554158
> ENSG00000072110	ENST00000553370
> ENSG00000072110	ENST00000553779
> ENSG00000072110	ENST00000556571
> ENSG00000072110	ENST00000553659
> ENSG00000072110	ENST00000556203
>   
> Result using only GeneID and TranscriptID and UniProtID:
> Ensembl Gene ID	Ensembl Transcript ID	UniProt/SwissProt Accession
> ENSG00000072110	ENST00000193403	P12814
> ENSG00000072110	ENST00000394419	P12814
> ENSG00000072110	ENST00000438964	P12814
> ENSG00000072110	ENST00000376839	
> ENSG00000072110	ENST00000555075	
> ENSG00000072110	ENST00000538545	
> ENSG00000072110	ENST00000544964	
> ENSG00000072110	ENST00000553290	
> ENSG00000072110	ENST00000555616	
> ENSG00000072110	ENST00000556433	
> ENSG00000072110	ENST00000553370	
> ENSG00000072110	ENST00000553779	
> ENSG00000072110	ENST00000556571	
> ENSG00000072110	ENST00000553659
>  
> 
> 
> Please notice that the transcripts found for the Gene ENSG00000072110 differ between the two cases (e.g. ENST00000556083 is not present in the second results).
> 
> For this analysis I use the current Biomart version. This problem does not occur if I use the older Biomart (hg18, Biomart archive from 2009, NCBI36). 
> 
> Am I missing something?
> 
> Thank you very much in advance.
> 
> Ivan
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120801/fde8db74/attachment.html>


More information about the Dev mailing list