[ensembl-dev] Bug or User error with filtering?

pip pipster pipsterpip at yahoo.com
Wed Aug 24 15:46:59 BST 2011


Sending this thread to the Ensembl mailing list now as it appears it may be Ensembl data related.  Any ideas why the ENSEMBL transcripts aren't mapped correctly to the GenBank Protein Accessions?  

Thank you for the help.

Best regards,
Phillipe


----- Forwarded Message -----
From: Elena Rivkin <Elena.Rivkin at oicr.on.ca>
To: pip pipster <pipsterpip at yahoo.com>
Cc: Junjun Zhang <Junjun.Zhang at oicr.on.ca>
Sent: Tuesday, August 23, 2011 1:11 PM
Subject: Re: [BioMart Users] Bug or User error with filtering?


Hi Phillipe,
From the data you provided, as you said, it looks like these Emsembl transcripts (ENST00000169293) (and many others in similar categories) are not mapped to the GenBank Protein Accessions, and therefore are not retrieved via quries to BioMart. 

Unfortunately, I don't know why that is. I recommned forwarding your question to Ensembl helpdesk, and they might be able to assist you in this matter. 

Thank you. 

Elena Rivkin, PhD
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel: 647-258-4316
Toll-free: 1-866-678-6427
www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
 
From:  pip pipster <pipsterpip at yahoo.com>
Reply-To:  pip pipster <pipsterpip at yahoo.com>
Date:  Tue, 23 Aug 2011 11:50:00 -0400
To:  Microsoft Office User <Elena.Rivkin at oicr.on.ca>, Junjun Zhang <Junjun.Zhang at oicr.on.ca>, "users at biomart.org" <users at biomart.org>
Cc:  Rhoda Kinsella via RT <helpdesk at ensembl.org>
Subject:  Re: [BioMart Users] Bug or User error with filtering?


Elena,
You should be able to follow this up the chain in getting accession numbers.

a.  From Transcript
http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=ENST00000169293


b.  To Gene (link to this Gene URL is located on Transcript link above)
http://www.ncbi.nlm.nih.gov/nuccore/D28593.1


c.  To Protein (link to this Protein URL is located on Gene link above)
http://www.ncbi.nlm.nih.gov/protein/471128


From this stand-point, I am led to believe that the Transcript maps to a Genbank protein accession and should not be filtered out with the $query->addFilter("with_protein_id", ["Only"]) filter.  But in either case I would like to understand why it's being filtered out since I have to trust the data I get back and deal with it accordingly.

Likewise, the following URL also appears to chain the Gene to the proper transcripts.
http://www.ebi.ac.uk/ena/data/view/D28593


It appears that for some reason the data in Emsembl is not mapping transcript ENST00000169293 (and many others in similar categories) to the proper Protein Accession.  But that's just my theory and would love to understand it better.  Thoughts?

Best regards,
Phillipe






________________________________
From: Elena Rivkin <Elena.Rivkin at oicr.on.ca>
To: pip pipster <pipsterpip at yahoo.com>; Junjun Zhang <Junjun.Zhang at oicr.on.ca>; "users at biomart.org" <users at biomart.org>
Cc: Rhoda Kinsella via RT <helpdesk at ensembl.org>
Sent: Monday, August 22, 2011 2:04 PM
Subject: Re: [BioMart Users] Bug or User error with filtering?


Hi Philliple, 
When entering Protein GeneBank ID: BAA05928, and retrieving Ensembl gene id and transcript id, I get the following:
ENSG00000127241 ENST00000337774

When entering Protein GeneBank ID: CAC17726, and retrieving Ensembl gene id and transcript id, I get the following:
ENSG000000127152, ENST000000357195

It appears that in the Ensembl mart that you are querying, these GeneBank Ids coorespond to a different transcripts (although to the same gene ID).
Regards, 
Elena Rivkin, PhD
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel: 647-258-4316
Toll-free: 1-866-678-6427
www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
 
From:  pip pipster <pipsterpip at yahoo.com>
Reply-To:  pip pipster <pipsterpip at yahoo.com>
Date:  Mon, 22 Aug 2011 13:51:56 -0400
To:  Junjun Zhang <Junjun.Zhang at oicr.on.ca>, "users at biomart.org" <users at biomart.org>
Cc:  Rhoda Kinsella via RT <helpdesk at ensembl.org>
Subject:  Re: [BioMart Users] Bug or User error with filtering?


Thank you Junjun.

Elena, to answer your question, I believe the ncbi links in the below thread include a link to the protein where you can get the protein accession number.  For example, for the 2 transcripts below you will find links to the following proteins.  You will also see that the transcripts are correctly showing up on the URL's as being protein coding.

http://www.ncbi.nlm.nih.gov/protein/471128 (accession BAA05928)
and
http://www.ncbi.nlm.nih.gov/protein/11558488 (accession CAC17726)

Thank you,
Phillipe



________________________________
From: Junjun Zhang <Junjun.Zhang at oicr.on.ca>
To: pip pipster <pipsterpip at yahoo.com>; "users at biomart.org" <users at biomart.org>
Cc: Rhoda Kinsella via RT <helpdesk at ensembl.org>
Sent: Monday, August 22, 2011 12:59 PM
Subject: Re: [BioMart Users] Bug or User error with filtering?


Hi Phillipe,

I am forwarding your questions to the Ensembl Helpdesk. Ensembl team is the best to answer questions about data contents in Ensembl databases.

Cheers,
Junjun

From:  Elena Rivkin <Elena.Rivkin at oicr.on.ca>
Date:  Mon, 22 Aug 2011 10:46:35 -0400
To:  pip pipster <pipsterpip at yahoo.com>, Rhoda Kinsella <rhoda at ebi.ac.uk>, "users at biomart.org" <users at biomart.org>
Subject:  Re: [BioMart Users] Bug or User error with filtering?


Hi Phillipe, 
>Can you let me know, for these two transcripts, what are their Genbank protein accessions. I cant find them. 
>
>
>Thank you. 
>Elena Rivkin, PhD
>Outreach and Training Coordinator, Informatics and Bio-computing
>
>Ontario Institute for Cancer Research
>MaRS Centre, South Tower
>101 College Street, Suite 800
>Toronto, Ontario, Canada M5G 0A3
>
>
>Tel: 647-258-4316
>Toll-free: 1-866-678-6427
>www.oicr.on.ca
>
>
>This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
> 
>
>From:  pip pipster <pipsterpip at yahoo.com>
>Reply-To:  pip pipster <pipsterpip at yahoo.com>
>Date:  Mon, 22 Aug 2011 10:32:43 -0400
>To:  Rhoda Kinsella <rhoda at ebi.ac.uk>, "users at biomart.org" <users at biomart.org>
>Subject:  Re: [BioMart Users] Bug or User error with filtering?
>
>
>
>After doing more investigation, something definitely isn't adding up.  As it turns out, filtering by Genbank protein accession is what we want and we need the ability to exclude.  The 2 transcripts below are examples (they show up as protein coding Genbank as well as Ensembl) but there are thousands more like this.  The filter below is taking them out despite them having a Genbank protein accession.  What may be causing this?
>
>
>
>ENST00000169293
>http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=ENST00000169293
>http://www.ncbi.nlm.nih.gov/nuccore/D28593?
>
>http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000127241;r=3:186964149-187009745;t=ENST00000169293
>
>ENST00000345514
>http://www.ncbi.nlm.nih.gov/gene?term=ENST00000345514
>http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000127152;r=14:99635624-99737822;t=ENST00000345514
>
>
>
>
>Filter used:
>Manual (non-Perl)
>    Homo sapiens genes (GRCh37.p3)
>    Filters
>        with protein ID(s): Only
>    Attributes
>        Ensembl Gene ID
>        Ensembl Transcript ID
>
>
>Same problem occurs using Perl filter as well
>    $query->addFilter("with_protein_id", ["Only"]);
>
>
>
>________________________________
>From: pip pipster <pipsterpip at yahoo.com>
>To: Rhoda Kinsella <rhoda at ebi.ac.uk>
>Cc: "users at biomart.org" <users at biomart.org>
>Sent: Monday, August 22, 2011 8:07 AM
>Subject: Re: [BioMart Users] Bug or User error with filtering?
>
>
>Rhoda,
>Thank you for the feedback, very helpful.  The Gene Type filter, 'protein_coding' will likely work, however it doesn't allow me to do an 'exclude' type filter (i.e. give me everything except for the non protein-coding genes).  Do you know if you can still do an exclude using the method you described?
>
>
>Thank you!
>Phillipe
>
>
>
>________________________________
>From: Rhoda Kinsella <rhoda at ebi.ac.uk>
>To: pip pipster <pipsterpip at yahoo.com>
>Cc: "users at biomart.org" <users at biomart.org>
>Sent: Monday, August 22, 2011 5:04 AM
>Subject: Re: [BioMart Users] Bug or User error with filtering?
>
>
>Hi Phillipe
>You are filtering using the protein ID (Genbank protein accession) and as this Ensembl protein ID does not have a corresponding Genbank protein accession, you will not get this ENSP. Please filter using the Gene type filter and select protein_coding. That way you will get the ENSP data you require.
>Regards
>Rhoda
>
>
>
>
>On 21 Aug 2011, at 22:54, pip pipster wrote:
>
>We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?
>>
>>Manual (non-Perl)
>>    Homo sapiens genes (GRCh37.p3)
>>    Filters
>>        with protein ID(s): Only
>>    Attributes
>>        Ensembl Gene ID
>>        Ensembl Transcript ID
>>
>>
>>Same problem occurs using Perl filter as well
>>    $query->addFilter("with_protein_id", ["Only"]);
>>
>>
>>Thank you,
>>Phillipe
>>_______________________________________________
>>Users mailing list
>>Users at biomart.org
>>https://lists.biomart.org/mailman/listinfo/users
>>
>
>Rhoda Kinsella Ph.D.
>Ensembl Bioinformatician,
>European Bioinformatics Institute (EMBL-EBI),
>Wellcome Trust Genome Campus, 
>Hinxton
>Cambridge CB10 1SD,
>UK. 
>
>
>
>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110824/2ae9c9c0/attachment.html>


More information about the Dev mailing list