[ensembl-dev] newbie with Ensembl API- alignments

Manuel Rodríguez Pascual manuel.rodriguez.pascual at gmail.com
Tue Jan 14 09:59:08 GMT 2014


Hi all,

Thanks for your valuable suggestions. however, I still have some minor
issues.

I understand that in Stephen's code, in order to obtain results equivalent
to the ones in the web,

http://www.ensembl.org/Homo_sapiens/Gene/Compara_Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045

variables ($ref_start, $ref_end) should be asigned to  (7112266, 7294045),
as my desired gene region is "Chromosome 19:
7,112,266-7,294,045<http://www.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000171105;r=19:7112266-7294045>
reverse
strand". I have called  $ref_slice_adaptor->fetch_by_region with strand 1
and -1.

However, the results I get are not the same than in your web example. Also,
when executing PHYML on the results, it suceeds on the web example but
fails on the one obtained with Stephen's code. I don't know wheter it is a
minor format issue or a deeper problem. Have you got any experience on
running phyml/phylip on the fetched results?


Thank you very much for your help. Best regards,


Manuel






2014/1/13 Stephen Fitzgerald <stephenf at ebi.ac.uk>

> Hi Manuel, you can also try our REST service, which will give you the
> option of accessing the data (output in JSON format) in a number of ways:
> http://beta.rest.ensembl.org/documentation/info/genomic_
> alignment_block_region
> for example (using human coords 19:7184302-7184344):
> bash> curl 'http://beta.rest.ensembl.org/alignment/block/region/homo_
> sapiens/19:7184302-7184344:1?species_set_group=primates' -H
> 'Content-type:application/json'
>
> Also, if you wish to use the perl API, here is a script for retrieving the
> alignments in phylip format:
> bash> perl script.pl > out.phylip
>
> ################ script.pl
>
> use strict;
> use warnings;
> use Data::Dumper;
> use Bio::AlignIO;
>
> use Bio::EnsEMBL::Registry;
>
> #Auto-configure the registry
> Bio::EnsEMBL::Registry->load_registry_from_db(
>         -host=>"ensembldb.ensembl.org", -user=>"anonymous",
>         -port=>'5306', db_version => 74);
>
>
> # Get the Compara Adaptor for MethodLinkSpeciesSets
> my $method_link_species_set_adaptor =
>     Bio::EnsEMBL::Registry->get_adaptor(
>       "Multi", "compara", "MethodLinkSpeciesSet");
>
> my $methodLinkSpeciesSet = $method_link_species_set_adaptor->
>         fetch_by_method_link_type_species_set_name("EPO", "primates");
>
> # Define the start and end positions for the alignment
> my ($ref_start, $ref_end) = (7184302, 7184344);
>
> # Get the reference species *core* Adaptor for Slices
> my $ref_slice_adaptor =
>     Bio::EnsEMBL::Registry->get_adaptor(
>       "homo_sapiens", "core", "Slice");
>
> # Get the slice corresponding to the region of interest
> my $ref_slice = $ref_slice_adaptor->fetch_by_region(
>     "chromosome", 19, $ref_start, $ref_end);
>
> # Get the Compara Adaptor for GenomicAlignBlocks
> my $genomic_align_block_adaptor =
>     Bio::EnsEMBL::Registry->get_adaptor(
>       "Multi", "compara", "GenomicAlignBlock");
>
> # The fetch_all_by_MethodLinkSpeciesSet_Slice() returns a ref.
> # to an array of GenomicAlingBlock objects (human is the reference species)
> my $all_genomic_align_blocks = $genomic_align_block_adaptor->
>     fetch_all_by_MethodLinkSpeciesSet_Slice(
>         $methodLinkSpeciesSet, $ref_slice);
>
> # set up an AlignIO to format SimpleAlign output
> my $alignIO = Bio::AlignIO->newFh(-interleaved => 0,
>                                   -fh => \*STDOUT,
>                                   -format => 'phylip',
>                                   -idlength => 30);
>
> # print the restricted alignments
> foreach my $genomic_align_block( @{ $all_genomic_align_blocks }) {
>         my $restricted_gab = $genomic_align_block->
> restrict_between_reference_positions($ref_start, $ref_end);
>         print $alignIO $restricted_gab->get_SimpleAlign;
> }
>
> ################ out.phylip
>  6 44
> homo_sapiens/19               cccc-agacccacatccagaactcacttgctggaattcatcgtg
> pongo_abelii/19               cccc-agacctacatccagaactcacttgctggaattcatcgtg
> pan_troglodytes/19            cccc-agacccacaaccagaactcacttgctggaattcatcgtg
> gorilla_gorilla/19            cccc-agacccacatctagaactcacttgctggaattcatcgtg
> callithrix_jacchus/22         ccccaagacccacatgcaggactcacttgctggaattcatcgtg
> macaca_mulatta/19             cccc-ggacccacatacagaactcacttgctggaattcatcgtg
>
>
> Cheers,
> Stephen.
>
>
>
>
> On Mon, 13 Jan 2014, Emily Pritchard wrote:
>
>  Hi Manuel
>>
>> Have you seen our API course on EBI Train Online:
>> http://www.ebi.ac.uk/training/online/course/ensembl-filmed-api-workshop
>>
>> The Core section will introduce you to the API itself, and the Compara
>> module for alignments.
>>
>> Hope this helps
>>
>> Emily
>>
>> On 13/01/2014 13:32, Manuel Rodríguez Pascual wrote:
>>       I am just starting working with Ensembl API. I am not really
>> experienced neither with Ensembl or bioinformatics itself,so I am stuck
>>       in a problem that seems to be easy to solve.
>>
>>
>> As a test and proof of concept, I am interested to retrieve information
>> of a comparison in phylip format. In particular, the provided
>> example
>> http://www.ensembl.org/Homo_sapiens/Gene/Compara_
>> Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
>>
>> when exported into a phylip format,
>>
>> http://www.ensembl.org/Homo_sapiens/Export/Output/
>> Location/Alignment?align=548;db=core;g=ENSG00000171105;
>> output=alignment;r=19:7112266-7294045;
>> format=phylip;_format=Text
>>
>> I have however seen that the employment of wget is discouraged and
>> the Ensembl API should be used instead.
>>
>> Reading the available documentation, I fell that I should use the Compara
>> API Tutorial,
>> http://www.ensembl.org/info/docs/api/compara/compara_tutorial.html
>>
>> but I don't really understand it or how to apply it to my objective.
>>
>> The question is, can anyone orient me on how to proceed?
>>
>>
>>
>> Thanks for your attention,
>>
>>
>>
>> Manuel
>>
>>
>> --
>> Dr. Manuel Rodríguez-Pascual
>> skype: manuel.rodriguez.pascual
>> phone: (+34) 913466173 // (+34) 679925108
>>
>> CIEMAT-Moncloa
>> Edificio 22, desp. 1.25
>> Avenida Complutense, 40
>> 28040- MADRID
>> SPAIN
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> --
>> Dr Emily Pritchard
>> Ensembl Outreach Officer
>>
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge
>> CB10 1SD
>> UK
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140114/35802c88/attachment.html>


More information about the Dev mailing list