[ensembl-dev] Projecting protein names and GO terms across species

mag mr6 at ebi.ac.uk
Thu Mar 13 17:17:58 GMT 2014


Hi Avril,

The system we are using is very similar to what you are describing.

The main differences I can think of are:

- we only project HGNC, MGI and ZFIN_ID gene names, not Uniprot or RefSeq

- we only project between one-to-one orthologs for most species
for fish species, we project between one-to-many orthologs

- for ontologies, we keep the same and do not look for the ancestor
we do, however, filter based on species (mammal-specific go terms should 
not be projected onto birds for example)
For this, there is a taxon-based constraint filter provided by GO:
http://www.ebi.ac.uk/QuickGO/GValidate?service=taxon&action=getConstraints

We also try whenever possible to project only between relatively close 
species, so are not including worm, fruitfly or seasquirt in the 
projections.


Hope that helps,
Magali

On 13/03/2014 09:17, alc wrote:
>
> Dear Ensembl developers and users,
>
> I'm involved in some helminth genome sequencing projects in my group, 
> and my colleague (Eleanor Stanley) has built an-house Compara database 
> for these genomes, from which we have inferred orthologs.
>
> I'm planning to to project protein names and GO terms across species. 
> I know that the Ensembl team do this already, but can't find many 
> details of how it's done on the web.
>
> I'm wondering whether my plan is very different from the Ensembl one, 
> here is what I'm thinking of doing:
>
> (i) Projecting protein names: for each gene in a query species (eg. 
> Strongyloides ratti), identify its  one-to-one and many-S.ratti-to-one 
> orthologs in C. elegans, S. mansoni, human, D. melanogaster, zebrafish 
> in our local Compara database. Take a protein name from a curated 
> UniProt entry for one of these orthologs (taking orthologs from those 
> species in order of preference given above), and project it to the 
> query gene. Give the projected protein name evidence code ECO:0000265 
> and give the UniProt accession of the source protein. If the same 
> protein name is projected to several query genes, then number then 
> with Arabic numerals, as described in the UniProt protein naming guide 
> www.uniprot.org/docs/nameprot   I couldn't find much information on 
> the web about how Ensembl project protein names so am wondering is 
> this very different?
>
> (ii) Projecting GO terms: for each gene in a query species (eg. 
> Strongyloides ratti), identify all its orthologs (one-to-one, 
> one-to-many, many-to-one, many-to-many) in C. elegans, S. mansoni, 
> human, D. melanogaster, zebrafish in our local Compara database. Take 
> manually curated GO terms of types IDA/IEP/IGI/IMP/IPI (excluding 
> 'protein binding') from the orthologs. For each pair of ortholog genes 
> from two different species, find the last common ancestors of their GO 
> terms in the GO hierarchy: project these ancestral GO terms to the 
> query gene. Do this for each pair of ortholog genes from two different 
> species. Give the projected GO terms evidence code 'IEA' and give the 
> UniProt accessions of the source proteins. [Note: by transferring the 
> last common ancestors of GO terms from orthologs from two different 
> species, I hope to be conservative and just project GO terms that are 
> likely to be conserved across species.] I found some information on 
> how Ensembl project GO terms on the web (http://www.ebi.ac.uk/GOA/ 
> <http://www.ebi.ac.uk/GOA/compara_go_annotations>compara_go_annotations <http://www.ebi.ac.uk/GOA/compara_go_annotations>), 
> but am not sure if the GO hierarchy is used at all as in my idea, or 
> if all GO terms are directly projected from orthologs to the query gene?
>
> Is this very different to what the Ensembl team are doing? I would be 
> very grateful to hear of any differences.
>
> Kind Regards,
>
> Avril
>
> Avril Coghlan
>
> Parasite Genomics Team
>
> Sanger Institute
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140313/0ee136ee/attachment.html>


More information about the Dev mailing list