[ensembl-dev] EnsEMBL compara / protein sequence alignments

Javier Herrero jherrero at ebi.ac.uk
Wed Nov 7 11:29:00 GMT 2012


BTW, we have an example script 
(ensembl-compara/scripts/examples/families_workshop_fetchFamilyAlignment.pl) 
that does something very similar to what you want (but just for one gene).

Javier

On 07/11/12 11:25, Javier Herrero wrote:
> Hi Sabrina
>
> It is certainly possible to get proteins from several species.
>
> If you are interested in getting alignments for all possible isoforms 
> (each possible protein from each gene), you would have to use the 
> Ensembl families. These are groups of similar proteins, but you should 
> not assume that they are all orthologues. To infer orthology, you need 
> a phylogenetic tree. The trees we provide are built using only one 
> single representative protein per gene.
>
> In your case, I would recommend to use the Ensembl families, query the 
> families using each cow (this is you query species, isn't it?) protein 
> and dump the alignments. There are several options for this. You may 
> want to use all possible species (the families are built using Ensembl 
> and non-Ensembl proteins) or limit the alignment to a subset of 
> species. Also, in some cases you will find that more than one cow 
> proteins are in the same family, so you will get duplicated 
> alignments. Is this OK?
>
> Kind regards
>
> Javier
>
> On 05/11/12 13:47, srodriguez wrote:
>> Hi Javier,
>>
>> Thank you for your answer.
>>
>> Actually, I would like to obtain, 1 file per protein query aligned to 
>> all other species ortholog proteins (and not 1 sequence to 1 sequence).
>>
>> ex:
>> for protein ENSBTAP00000032594, the file containing:
>> ENSBTAP00000032594/1-397 
>> MDALRASAAKPPTGRKMKARAPPPPGKPATPNLHSGQRSPRRASPGPPQNQLSR
>> ENSP00000265136/1-1261 
>> MDAPRASAAKPPTGRKMKARAPPPPGKAATLHVHSDQKPPHDGALGSQQNLVRMK
>> ENSSPECIE2...
>> ENSSPECIE3...
>>                          *** ***********************.** ::**.*:.*: .: 
>> *. ** :
>>
>> Also, I would like to have 1 file per protein from the query, and if 
>> a gene has several proteins, obtain all the proteins query as single 
>> files with the alignment as above.
>>
>> Do you know if it is feasible to obtain such an output with Ensembl 
>> compara?
>>
>> In that case, could you please modify the script to obtain it?
>>
>> Thank you very much in advance.
>>
>> Best regards,
>>
>> Sabrina.
>>
>>
>>
>>
>>
>>
>> Javier Herrero <jherrero at ebi.ac.uk> a écrit :
>>
>>> Dear Sabrina
>>>
>>> I have modified the script slightly only. Essentially, I have 
>>> removed some bits that were not required and cleaned up the code a 
>>> little. I have also added the possibility of specifying the query 
>>> and the target species in the command line. Last, I have also 
>>> changed the script to output the alignments into separate files.
>>>
>>> Your strategy using the ENSEMBLGENE was correct. Indeed, you get two 
>>> proteins aligned. I believe this is what you want, isn't it?
>>>
>>> I have added a few comments. Let me know if there something that is 
>>> not clear.
>>>
>>> Javier
>>>
>>> On 22/10/12 15:58, srodriguez wrote:
>>>> Dear all,
>>>>
>>>> I would like to use compara EnsEMBL API to get the aligned protein 
>>>> sequences of a query animal with homologous protein sequences from 
>>>> other species.
>>>>
>>>> The script would take as input the query specie name, (and if 
>>>> possible the hit species names). The script would get the proteins 
>>>> of the query organism, then the homologous protein sequences, and 
>>>> then retrieves 1 file per protein query sequence containing the 
>>>> alignment of the query (placed as the first sequence) and then the 
>>>> other specie protein sequences aligned.
>>>>
>>>> I was thinking about using an "homology adaptor" with ENSEMBLPEP, 
>>>> so I started a script that way, but I do not obtain any results 
>>>> with ENSEMBLPEP and the results with ENSEMBLGENE are 2 sequences 
>>>> per alignment (see script attached).
>>>>
>>>> I also tried with "families", but sometimes, I do not get the 
>>>> protein sequence for my specie query in the sequence alignment even 
>>>> though I searched by using my taxon id (script N#2 attached).
>>>>
>>>> Would you have a script that already performs my goal?
>>>>
>>>> If not, could you please help me reaching my goal?
>>>>
>>>> Thank you very much in advance.
>>>>
>>>> Best regards,
>>>>
>>>> Sabrina.
>>>>
>>>>
>>>> *******************************************
>>>> Sabrina Rodriguez
>>>> Bioinformatics
>>>> Département de Génétique animale
>>>> Unité GABI
>>>> Domaine de Vilvert
>>>> 78532 Jouy en josas
>>>>
>>>> +33 (0) 1 34 65 29 53
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: 
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>> -- 
>>> Javier Herrero, PhD
>>> Ensembl Coordinator and Ensembl Compara Project Leader
>>> European Bioinformatics Institute (EMBL-EBI)
>>> Wellcome Trust Genome Campus, Hinxton
>>> Cambridge - CB10 1SD - UK
>>>
>>>
>>
>>
>>
>>
>> *******************************************
>> Sabrina Rodriguez
>> Bioinformatics
>> Département de Génétique animale
>> Unité GABI
>> Domaine de Vilvert
>> 78532 Jouy en josas
>>
>> +33 (0) 1 34 65 29 53
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
> -- 
> Javier Herrero, PhD
> Ensembl Coordinator and Ensembl Compara Project Leader
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge - CB10 1SD - UK

-- 
Javier Herrero, PhD
Ensembl Coordinator and Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20121107/051ce14e/attachment.html>


More information about the Dev mailing list