[ensembl-dev] Biomart inconsistency

Ivan Kel ikel at MIT.EDU
Wed Aug 1 20:18:11 BST 2012


Hey Rhoda,

Thank you very much for your answer. It makes sense for me now.
Best regards,
Ivan

2012/8/1 Rhoda Kinsella <rhoda at ebi.ac.uk>

> Hi Ivan
> This is a well known issue with BioMart which we have fed back to the
> BioMart developers. If you take a look a this gene on the Ensembl website
> you will understand what is happening here:
>
>
> http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000072110;r=14:69340860-69446157
>
>
> You will see that this gene has 21 transcripts. This corresponds to your
> first query in BioMart. When you add the attribute for UniProt accession,
> this essentially acts like a filter as you only retrieve transcripts that
> are protein coding (there are 14 protein coding transcripts which
> corresponds to your second query in BioMart). The tools we use to build the
> gene mart will not currently allow us to add a necessary left join to allow
> retrieval of all transcripts in the results whether protein coding or not.
> The reason you don't see this issue in the NCBI36 BioMart is because there
> were only 2 protein coding transcripts for this gene in in 2009 so you will
> retrieve everything.
>
>
> http://may2009.archive.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000072110;r=14:69340860-69446157
>
> I hope this makes sense but please don't hesitate to get back to me if you
> require more information.
> Regards
> Rhoda
>
>
> On 31 Jul 2012, at 20:24, Ivan Kel wrote:
>
> Greetings,
>
> I am using Ensembl Biomart to map Ensembl Gene IDs to Transcript IDs and
> UniProt/SwissProt Accession numbers.
> Surprisingly, in several cases the corresponding Transcript IDs found for
> a Gene ID deffer depending on whether or not I add the UniProt number to
> the search.
> To clarify here is an example:
> Ensembl Gene ID: ENSG00000072110
> Result using only GeneID and TranscriptID:
> Ensembl Gene ID Ensembl Transcript ID  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000193403<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000193403>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000556083<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000556083>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000553882<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000553882>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000394419<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000394419>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000438964<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000438964>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000376839<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000376839>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000555075<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000555075>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000538545<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000538545>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000544964<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000544964>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000553290<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000553290>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000556432<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000556432>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000556343<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000556343>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000555616<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000555616>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000556433<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000556433>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000554508<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000554508>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000554158<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000554158>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000553370<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000553370>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000553779<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000553779>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000556571<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000556571>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000553659<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000553659>
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000556203<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000556203>
>
> Result using only GeneID and TranscriptID and UniProtID:
> Ensembl Gene ID Ensembl Transcript ID UniProt/SwissProt Accession
> ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000193403<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000193403>
> P12814 <http://www.uniprot.org/uniprot/P12814>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000394419<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000394419>
> P12814 <http://www.uniprot.org/uniprot/P12814>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000438964<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000438964>
> P12814 <http://www.uniprot.org/uniprot/P12814>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000376839<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000376839>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000555075<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000555075>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000538545<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000538545>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000544964<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000544964>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000553290<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000553290>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000555616<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000555616>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000556433<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000556433>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000553370<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000553370>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000553779<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000553779>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000556571<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000556571>
>  ENSG00000072110<http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000072110>
> ENST00000553659<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000553659>
>
>
>
> Please notice that the transcripts found for the Gene ENSG00000072110
> differ between the two cases (e.g. ENST00000556083<http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;t=ENST00000556083>is not present in the second results)
> .
>
> For this analysis I use the current Biomart version. This problem does not
> occur if I use the older Biomart (hg18, Biomart archive from 2009, NCBI36).
>
> Am I missing something?
>
> Thank you very much in advance.
>
> Ivan
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120801/2a4a7a63/attachment.html>


More information about the Dev mailing list