[ensembl-dev] EnsEMBL compara / protein sequence alignments

Mon Nov 5 13:47:01 GMT 2012

Hi Javier,

Thank you for your answer.

Actually, I would like to obtain, 1 file per protein query aligned to  
all other species ortholog proteins (and not 1 sequence to 1 sequence).

ex:
for protein ENSBTAP00000032594, the file containing:
ENSBTAP00000032594/1-397  
MDALRASAAKPPTGRKMKARAPPPPGKPATPNLHSGQRSPRRASPGPPQNQLSR
ENSP00000265136/1-1261    
MDAPRASAAKPPTGRKMKARAPPPPGKAATLHVHSDQKPPHDGALGSQQNLVRMK
ENSSPECIE2...
ENSSPECIE3...
                          *** ***********************.** ::**.*:.*: .: *. ** :

Also, I would like to have 1 file per protein from the query, and if a  
gene has several proteins, obtain all the proteins query as single  
files with the alignment as above.

Do you know if it is feasible to obtain such an output with Ensembl compara?

In that case, could you please modify the script to obtain it?

Thank you very much in advance.

Best regards,

Sabrina.

Javier Herrero <jherrero at ebi.ac.uk> a écrit :

> Dear Sabrina
>
> I have modified the script slightly only. Essentially, I have  
> removed some bits that were not required and cleaned up the code a  
> little. I have also added the possibility of specifying the query  
> and the target species in the command line. Last, I have also  
> changed the script to output the alignments into separate files.
>
> Your strategy using the ENSEMBLGENE was correct. Indeed, you get two  
> proteins aligned. I believe this is what you want, isn't it?
>
> I have added a few comments. Let me know if there something that is  
> not clear.
>
> Javier
>
> On 22/10/12 15:58, srodriguez wrote:
>> Dear all,
>>
>> I would like to use compara EnsEMBL API to get the aligned protein  
>> sequences of a query animal with homologous protein sequences from  
>> other species.
>>
>> The script would take as input the query specie name, (and if  
>> possible the hit species names). The script would get the proteins  
>> of the query organism, then the homologous protein sequences, and  
>> then retrieves 1 file per protein query sequence containing the  
>> alignment of the query (placed as the first sequence) and then the  
>> other specie protein sequences aligned.
>>
>> I was thinking about using an "homology adaptor" with ENSEMBLPEP,  
>> so I started a script that way, but I do not obtain any results  
>> with ENSEMBLPEP and the results with ENSEMBLGENE are 2 sequences  
>> per alignment (see script attached).
>>
>> I also tried with "families", but sometimes, I do not get the  
>> protein sequence for my specie query in the sequence alignment even  
>> though I searched by using my taxon id (script N#2 attached).
>>
>> Would you have a script that already performs my goal?
>>
>> If not, could you please help me reaching my goal?
>>
>> Thank you very much in advance.
>>
>> Best regards,
>>
>> Sabrina.
>>
>>
>> *******************************************
>> Sabrina Rodriguez
>> Bioinformatics
>> Département de Génétique animale
>> Unité GABI
>> Domaine de Vilvert
>> 78532 Jouy en josas
>>
>> +33 (0) 1 34 65 29 53
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:  
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
> -- 
> Javier Herrero, PhD
> Ensembl Coordinator and Ensembl Compara Project Leader
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge - CB10 1SD - UK
>
>

*******************************************
Sabrina Rodriguez
Bioinformatics
Département de Génétique animale
Unité GABI
Domaine de Vilvert
78532 Jouy en josas

+33 (0) 1 34 65 29 53
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sabrina.pl
Type: text/x-perl
Size: 3211 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20121105/95966f51/attachment.bin>