[ensembl-dev] Projecting protein names and GO terms across species
mag
mr6 at ebi.ac.uk
Thu Mar 13 17:17:58 GMT 2014
Hi Avril,
The system we are using is very similar to what you are describing.
The main differences I can think of are:
- we only project HGNC, MGI and ZFIN_ID gene names, not Uniprot or RefSeq
- we only project between one-to-one orthologs for most species
for fish species, we project between one-to-many orthologs
- for ontologies, we keep the same and do not look for the ancestor
we do, however, filter based on species (mammal-specific go terms should
not be projected onto birds for example)
For this, there is a taxon-based constraint filter provided by GO:
http://www.ebi.ac.uk/QuickGO/GValidate?service=taxon&action=getConstraints
We also try whenever possible to project only between relatively close
species, so are not including worm, fruitfly or seasquirt in the
projections.
Hope that helps,
Magali
On 13/03/2014 09:17, alc wrote:
>
> Dear Ensembl developers and users,
>
> I'm involved in some helminth genome sequencing projects in my group,
> and my colleague (Eleanor Stanley) has built an-house Compara database
> for these genomes, from which we have inferred orthologs.
>
> I'm planning to to project protein names and GO terms across species.
> I know that the Ensembl team do this already, but can't find many
> details of how it's done on the web.
>
> I'm wondering whether my plan is very different from the Ensembl one,
> here is what I'm thinking of doing:
>
> (i) Projecting protein names: for each gene in a query species (eg.
> Strongyloides ratti), identify its one-to-one and many-S.ratti-to-one
> orthologs in C. elegans, S. mansoni, human, D. melanogaster, zebrafish
> in our local Compara database. Take a protein name from a curated
> UniProt entry for one of these orthologs (taking orthologs from those
> species in order of preference given above), and project it to the
> query gene. Give the projected protein name evidence code ECO:0000265
> and give the UniProt accession of the source protein. If the same
> protein name is projected to several query genes, then number then
> with Arabic numerals, as described in the UniProt protein naming guide
> www.uniprot.org/docs/nameprot I couldn't find much information on
> the web about how Ensembl project protein names so am wondering is
> this very different?
>
> (ii) Projecting GO terms: for each gene in a query species (eg.
> Strongyloides ratti), identify all its orthologs (one-to-one,
> one-to-many, many-to-one, many-to-many) in C. elegans, S. mansoni,
> human, D. melanogaster, zebrafish in our local Compara database. Take
> manually curated GO terms of types IDA/IEP/IGI/IMP/IPI (excluding
> 'protein binding') from the orthologs. For each pair of ortholog genes
> from two different species, find the last common ancestors of their GO
> terms in the GO hierarchy: project these ancestral GO terms to the
> query gene. Do this for each pair of ortholog genes from two different
> species. Give the projected GO terms evidence code 'IEA' and give the
> UniProt accessions of the source proteins. [Note: by transferring the
> last common ancestors of GO terms from orthologs from two different
> species, I hope to be conservative and just project GO terms that are
> likely to be conserved across species.] I found some information on
> how Ensembl project GO terms on the web (http://www.ebi.ac.uk/GOA/
> <http://www.ebi.ac.uk/GOA/compara_go_annotations>compara_go_annotations <http://www.ebi.ac.uk/GOA/compara_go_annotations>),
> but am not sure if the GO hierarchy is used at all as in my idea, or
> if all GO terms are directly projected from orthologs to the query gene?
>
> Is this very different to what the Ensembl team are doing? I would be
> very grateful to hear of any differences.
>
> Kind Regards,
>
> Avril
>
> Avril Coghlan
>
> Parasite Genomics Team
>
> Sanger Institute
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140313/0ee136ee/attachment.html>
More information about the Dev
mailing list