[ensembl-dev] RE Lesser annotation names in sscrofa in V72??

Ed gray_ed at hotmail.com
Wed Jun 26 04:21:41 BST 2013


Thanks Burt, I'm kind of a detailed guy who looks at specific situations
when I manage data.  So while your explanation sounds reasonable, lets look
at the first of the two examples I chose: C4A

 

Vega: Search Vega for C4, and you get nothing related to C4A.  Searches for
C4A and C4B and you get useful information.  Specifically C4A
OTTSUSG00000001346 .

Uniprot: Searching for C4: After leafing through pages of tobacco and
tomatoes, I found F1RQW2 (F1RQW2_PIG) that says "The sequence shown here is
derived from an Ensembl automatic analysis pipeline and should be considered
as preliminary data." Search for C4A, leads to the various C4As in mostly
mammals, and I find A5A8W8 (A5A8W8_PIG) with no Cautions under the General
annotation (Comments).  So clearly Uniprot has them both, one from a curated
source and one from ensembl's preliminary predictions.

 

Search for C4 in the Pig on Ensembl.org, you get 4 genes, one of which is 4
Genes match your query ('C4') in Pig ENSSSCG00000001427 that maps to C4A
OTTSUSG00000001346 in Vega.

 

Search for C4A in ensemble.org, and you get 

Results Summary - Your search of Pig with 'C4A' returned no results.

 

Taken together, it would seem your predicted protein "C4" made it to Uniprot
and then somehow that allowed you to think deleting the name C4A was in
order.

 

Finally, just look at ncbi for C4 and C4A:
http://www.ncbi.nlm.nih.gov/gquery/?term=C4A vs
http://www.ncbi.nlm.nih.gov/gquery/?term=C4 .  Which is the correct name to
use??

 

So while I understand, appreciate and support your desire to make sense of
the big uncoordinated mess we all have called gene annotations.  However, in
this case it seems that there is something not entirely productive with your
approach since it seems to add more confusion.

 

Also since C4A is a valid protein and apparently gene name in Uniprot, I
don't think I can report a bug to them.  Thank you for your attention and
assistance!

 

Ed

 

 

From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of
Bert Overduin
Sent: Monday, June 24, 2013 1:48 PM
To: Ensembl developers list
Subject: Re: [ensembl-dev] RE Lesser annotation names in sscrofa in V72??

 

Dear Ed,

 

I am afraid this is not something that can be changed in Ensembl. As Magali
explained, Ensembl retrieves the gene names from UniProt and they have
removed the names in question. So, if you want to enquire whether these gene
names can be reinstated, you have to contact UniProt at
<mailto:help at uniprot.org> help at uniprot.org. Unless the removal is a bug, I
assume they must have had a valid reason to do this, though.

 

I appreciate that this possibly is not the answer you'd like to hear, but
these kind of things have to be fixed at the source, otherwise it would
become one big uncoordinated mess. I hope this makes sense.

 

With kind regards,

Bert

 

On Mon, Jun 24, 2013 at 6:35 PM, Ed <gray_ed at hotmail.com> wrote:

 

 

From: Ed [mailto:gray_ed at hotmail.com] 
Sent: Monday, June 24, 2013 9:35 AM
To: 'Magali'; 'dev at ensembl.org'
Subject: RE: [ensembl-dev] Lesser annotation names in sscrofa in V72??

 

Dear Magali,

 

Thank you so much for responding.  Your answer seems reasonable.

 

I guess I was a little taken aback that a number of gene names removed were,
I believe, the preferred HUGO name for the gene.

 

Take for instance C4A.  By changing it to C4, it creates some confusion in
the ensembl comparative genomics section on what is mapped.  C4B seems
clearly mapped to C4B, while it looks like what is now C4 (used to be C4A)
maps to both C4A and C4B.

 

Another example is PDX-1, an important Gene for sure.  Recent news stories
(http://www.telegraph.co.uk/science/science-news/8584443/Pigs-could-grow-hum
an-organs-in-stem-cell-breakthrough.html ) are based on work on the PDX-1
gene.  While Hugo has the gene listed as PDX1, PDX-1 is an approved synonym
and many of the papers in academic journals use (and still use) PDX-1.

 

Those two were notable only with 30 second review of the eliminated names
from a single researcher.

 

Thank you for your consideration and explanation, however, I'd suggest you
may want to reconsider.

 

Best wishes,

Ed

 

 

From: Magali [mailto:mr6 at ebi.ac.uk] 
Sent: Friday, June 21, 2013 4:57 AM
To: Ensembl developers list
Cc: Ed
Subject: Re: [ensembl-dev] Lesser annotation names in sscrofa in V72??

 

Hi Ed,

In release 72, we updated the external references for pig.

This means we use the latest sets of data from external sources, like
Uniprot for example.
A number of gene names assigned in Uniprot have been removed in the latest
set used in 72, causing the drop you notice.

The number of assigned external references is still comparable or higher
than in 71, but no trustworthy gene name was available for these
annotations.

See http://www.uniprot.org/uniprot/F1SF30?version=12
<http://www.uniprot.org/uniprot/F1SF30?version=12&version=13> &version=13
for example.
Uniprot entry F1SF30 in pig has had its name remove between versions 12 and
13.
So we still assign that uniprot entry to ensembl gene ENSSSCG00000015625,
but we cannot deduce a gene name from it.


Hope that helps,
Magali

On 20/06/13 19:27, Ed wrote:

Dear Ensembl developers,
 
We extract ensembl annotations for sus scrofa 10.2 and load them into a
GBrowse database.  When we updated to ensembl V71, there were a few hundred
additional annotations, a generally expected result.
 
A test load of ensembl 72 yielded an unexpected result, fewer annotations.
The 1241 'names' I believe were removed are listed below.
 
I normally expect a few additional annotations and seeing so many removed
was a little unsettling.
 
Any ideas or comments if this is a bug or a feature?
 
Ed


_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info:
http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/





 

-- 
Bert Overduin, Ph.D.
Vertebrate Genomics Team

EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom

http://www.ebi.ac.uk/~bert

Ensembl browser: http://www.ensembl.org <http://www.ensembl.org/> 

Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html

Blog: http://www.ensembl.info <http://www.ensembl.info/> 

YouTube: http://www.youtube.com/user/EnsemblHelpdesk
Facebook: http://www.facebook.com/Ensembl.org
Twitter: http://twitter.com/Ensembl 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130625/b6c83ed1/attachment.html>


More information about the Dev mailing list