[ensembl-dev] [Ensembl-compara] Extracting orthologs proteins from multiple alignments

Benjamin Dubreuil dubreuil.benjamin at hotmail.com
Tue May 28 22:16:20 BST 2013


Hi,

I am trying to find a way for aligning all orthologs proteins from 18 mammals species to human proteins.
I've read the Gene Orthology/Paralogy prediction method.
In the step 4 of this pipeline, for each cluster of genes (clustering based on their Blast scores), they've built multiple alignments using protein sequences .
For each cluster, build a multiple alignment based on the protein sequences using a combination of multiple aligners, consensified by M-Coffee
All the aligments are available here in FASTA format or in EMF format.I've downloaded those data. Then, I've filtered out all protein sequences that don't belong to one of the 18 mammals species on which I'm focused. 
Still, my problem remains now with the paralogs proteins. I can't get rid off those efficiently.

For one of the alignment, I have those mammal species GI|Human Orthologs GI  (if Human Orthologs exists) :
ENSBTAG00000001468|ENSG00000020577ENSBTAG00000009785|ENSG00000179134ENSCAFG00000005568|ENSG00000179134ENSCAFG00000014940|ENSG00000020577ENSECAG00000010681|ENSG00000020577ENSECAG00000015551|ENSG00000179134ENSFCAG00000010739|ENSG00000179134ENSG00000020577ENSG00000179134ENSGGOG00000000673|ENSG00000020577ENSGGOG00000024088|ENSG00000179134ENSLAFG00000011957|ENSG00000179134ENSLAFG00000013245|ENSG00000020577ENSMODG00000011669|ENSG00000020577ENSMODG00000013478|ENSG00000179134ENSMPUG00000005514|ENSG00000020577ENSMPUG00000017729|ENSG00000179134ENSMUSG00000021838|ENSG00000020577ENSMUSG00000037513|ENSG00000179134ENSOCUG00000000343|ENSG00000179134ENSOCUG00000003420|ENSG00000020577ENSOGAG00000009678|ENSG00000020577ENSOGAG00000032553|ENSG00000179134ENSPPYG00000005832|ENSG00000020577ENSPPYG00000009963|ENSG00000179134ENSPTRG00000006364|ENSG00000020577ENSPTRG00000010961|ENSG00000179134ENSPVAG00000011118|ENSG00000179134ENSPVAG00000016060|ENSG00000020577ENSRNOG00000010489|ENSG00000020577ENSRNOG00000019831|ENSG00000179134ENSSSCG00000010706|ENSG00000179134ENSSSCG00000016927|ENSG00000179134ENSSSCG00000023408|ENSG00000020577ENSSTOG00000003379|ENSG00000179134ENSSTOG00000015276|ENSG00000020577ENSTTRG00000003554|ENSG00000179134ENSTTRG00000008280|ENSG00000020577
So I don't know which Human GI I should select (ENSG00000179134 or ENSG00000020577).
Should I split this alignment in two ?
My final goal would be to have one human protein aligned with at least 10 orthologs proteins from a different species out of the 18 mammals species, which I'm studying.
So I'm trying to find the best way to do it... Any suggestions ?Am i mistaking in the way of achieving it ?
Best.
Dubreuil Benjamin 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130528/f17f9a9d/attachment.html>


More information about the Dev mailing list