[ensembl-dev] newbie with Ensembl API- alignments

Manuel Rodríguez Pascual manuel.rodriguez.pascual at gmail.com
Wed Jan 15 11:42:40 GMT 2014


Dear Stephen,

You were absolutely right. when retrieving just  a single block (my
($ref_start, $ref_end) = (7247302, 7267366),  for example) phyml works
seamlessly.

I however feel that this is just a workaround. As the web example also
works with phyml and is composed by more than one block, I tend to think
that the problem is somewhere else (maybe in the alignment you mentioned?).
Anyway, finding this issue is out of the scope of my work right now: I am
focused on testing this as a proof of concept of the technology, that has
succeeded thanks to your help. I only hope that if anyone else tries to
perform the same operation finds this useful as a starting point.

Thanks again for your fast and reliable support. Best regards,



Manuel



2014/1/14 Stephen Fitzgerald <stephenf at ebi.ac.uk>

> Hi Manuel, yes, there is a difference between the data retrieved by the
> web and the data retrieved by the API code.
> Because the region (chr19:7112266-7294045) is quite large, it is actually
> composed of 6 primate alignment blocks:
> +-------------+-----------+
> | block_start | block_end |
> +-------------+-----------+
> |     7058566 |   7122884 |
> |     7122885 |   7192725 |
> |     7192726 |   7211509 |
> |     7211510 |   7226908 |
> |     7226909 |   7247301 |
> |     7247302 |   7267366 |
> +-------------+-----------+
>
> In this alignment:
> http://www.ensembl.org/Homo_sapiens/Gene/Compara_
> Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
> The web-code retrieves the full alignment for the last 5 blocks and a
> truncated (restricted) region from the first block (7112266-7122884).
> However, the web-code will connect the blocks with unaligned
> reference sequence (human in this case) if there are any gaps
> between the blocks (in this region there are no gaps between the blocks)
> and pad out any upstream or downstream regions (where there are no
> alignment blocks) with reference sequence.
> In this case the downstream (on the +ve strand) 26679 bp in the
> 7112266-7294045 region has no primate alignments (so, you get unaligned
> human sequence for this region).
>
> In contrast, the script will only retrieve the primate alignment blocks
> associated with the region (no extra reference padding sequence will be
> added) to connect or extend the blocks. In the above example, if you
> output the results of the script to a file, you will get six blocks in the
> one file. I have not run phyml/phylip, but it may expect only one block of
> alignment per file.
>
> Hope this helps.
> Stephen.
>
>
>
>
> On Tue, 14 Jan 2014, Manuel Rodríguez Pascual wrote:
>
>  Hi all,
>> Thanks for your valuable suggestions. however, I still have some minor
>> issues.
>>
>> I understand that in Stephen's code, in order to obtain results
>> equivalent to the ones in the web,
>>
>> http://www.ensembl.org/Homo_sapiens/Gene/Compara_
>> Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
>>
>> variables ($ref_start, $ref_end) should be asigned to  (7112266,
>> 7294045), as my desired gene region is "Chromosome 19:
>> 7,112,266-7,294,045 reverse strand". I have called
>>  $ref_slice_adaptor->fetch_by_region with strand 1 and -1.
>>
>> However, the results I get are not the same than in your web example.
>> Also, when executing PHYML on the results, it suceeds on the web example
>> but fails on the one obtained with Stephen's code. I don't know wheter it
>> is a minor format issue or a deeper problem. Have you got any
>> experience on running phyml/phylip on the fetched results?
>>
>>
>> Thank you very much for your help. Best regards,
>>
>>
>> Manuel
>>
>>
>>
>>
>>
>>
>> 2014/1/13 Stephen Fitzgerald <stephenf at ebi.ac.uk>
>>       Hi Manuel, you can also try our REST service, which will give you
>> the option of accessing the data (output in JSON format) in a
>>       number of ways:
>>       http://beta.rest.ensembl.org/documentation/info/genomic_
>> alignment_block_region
>>       for example (using human coords 19:7184302-7184344):
>>       bash> curl 'http://beta.rest.ensembl.org/
>> alignment/block/region/homo_sapiens/19:7184302-7184344:1?
>> species_set_group=primates' -H
>>       'Content-type:application/json'
>>
>>       Also, if you wish to use the perl API, here is a script for
>> retrieving the alignments in phylip format:
>>       bash> perl script.pl > out.phylip
>>
>>       ################ script.pl
>>
>>       use strict;
>>       use warnings;
>>       use Data::Dumper;
>>       use Bio::AlignIO;
>>
>>       use Bio::EnsEMBL::Registry;
>>
>>       #Auto-configure the registry
>>       Bio::EnsEMBL::Registry->load_registry_from_db(
>>               -host=>"ensembldb.ensembl.org", -user=>"anonymous",
>>               -port=>'5306', db_version => 74);
>>
>>
>>       # Get the Compara Adaptor for MethodLinkSpeciesSets
>>       my $method_link_species_set_adaptor =
>>           Bio::EnsEMBL::Registry->get_adaptor(
>>             "Multi", "compara", "MethodLinkSpeciesSet");
>>
>>       my $methodLinkSpeciesSet = $method_link_species_set_adaptor->
>>               fetch_by_method_link_type_species_set_name("EPO",
>> "primates");
>>
>>       # Define the start and end positions for the alignment
>>       my ($ref_start, $ref_end) = (7184302, 7184344);
>>
>>       # Get the reference species *core* Adaptor for Slices
>>       my $ref_slice_adaptor =
>>           Bio::EnsEMBL::Registry->get_adaptor(
>>             "homo_sapiens", "core", "Slice");
>>
>>       # Get the slice corresponding to the region of interest
>>       my $ref_slice = $ref_slice_adaptor->fetch_by_region(
>>           "chromosome", 19, $ref_start, $ref_end);
>>
>>       # Get the Compara Adaptor for GenomicAlignBlocks
>>       my $genomic_align_block_adaptor =
>>           Bio::EnsEMBL::Registry->get_adaptor(
>>             "Multi", "compara", "GenomicAlignBlock");
>>
>>       # The fetch_all_by_MethodLinkSpeciesSet_Slice() returns a ref.
>>       # to an array of GenomicAlingBlock objects (human is the reference
>> species)
>>       my $all_genomic_align_blocks = $genomic_align_block_adaptor->
>>           fetch_all_by_MethodLinkSpeciesSet_Slice(
>>               $methodLinkSpeciesSet, $ref_slice);
>>
>>       # set up an AlignIO to format SimpleAlign output
>>       my $alignIO = Bio::AlignIO->newFh(-interleaved => 0,
>>                                         -fh => \*STDOUT,
>>                                         -format => 'phylip',
>>                                         -idlength => 30);
>>
>>       # print the restricted alignments
>>       foreach my $genomic_align_block( @{ $all_genomic_align_blocks }) {
>>               my $restricted_gab = $genomic_align_block->
>> restrict_between_reference_positions($ref_start, $ref_end);
>>               print $alignIO $restricted_gab->get_SimpleAlign;
>>       }
>>
>>       ################ out.phylip
>>        6 44
>>       homo_sapiens/19               cccc-agacccacatccagaactcacttgctggaa
>> ttcatcgtg
>>       pongo_abelii/19               cccc-agacctacatccagaactcacttgctggaa
>> ttcatcgtg
>>       pan_troglodytes/19            cccc-agacccacaaccagaactcacttgctggaa
>> ttcatcgtg
>>       gorilla_gorilla/19            cccc-agacccacatctagaactcacttgctggaa
>> ttcatcgtg
>>       callithrix_jacchus/22         ccccaagacccacatgcaggactcacttgc
>> tggaattcatcgtg
>>       macaca_mulatta/19             cccc-ggacccacatacagaactcacttgctggaa
>> ttcatcgtg
>>
>>
>>       Cheers,
>>       Stephen.
>>
>>
>>
>>       On Mon, 13 Jan 2014, Emily Pritchard wrote:
>>
>>             Hi Manuel
>>
>>             Have you seen our API course on EBI Train Online:
>>             http://www.ebi.ac.uk/training/online/course/ensembl-filmed-
>> api-workshop
>>
>>             The Core section will introduce you to the API itself, and
>> the Compara module for alignments.
>>
>>             Hope this helps
>>
>>             Emily
>>
>>             On 13/01/2014 13:32, Manuel Rodríguez Pascual wrote:
>>                   I am just starting working with Ensembl API. I am not
>> really experienced neither with Ensembl or bioinformatics
>>             itself,so I am stuck
>>                   in a problem that seems to be easy to solve.
>>
>>
>>             As a test and proof of concept, I am interested to retrieve
>> information of a comparison in phylip format. In particular,
>>             the provided
>>             example
>>             http://www.ensembl.org/Homo_sapiens/Gene/Compara_
>> Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
>>
>>             when exported into a phylip format,
>>
>>             http://www.ensembl.org/Homo_sapiens/Export/Output/
>> Location/Alignment?align=548;db=core;g=ENSG00000171105;
>> output=alignment;r=19:7112266-7294045;
>>             format=phylip;_format=Text
>>
>>             I have however seen that the employment of wget is
>> discouraged and the Ensembl API should be used instead.
>>
>>             Reading the available documentation, I fell that I should use
>> the Compara API Tutorial,
>>             http://www.ensembl.org/info/docs/api/compara/compara_
>> tutorial.html
>>
>>             but I don't really understand it or how to apply it to my
>> objective.
>>
>>             The question is, can anyone orient me on how to proceed?
>>
>>
>>
>>             Thanks for your attention,
>>
>>
>>
>>             Manuel
>>
>>
>>             --
>>             Dr. Manuel Rodríguez-Pascual
>>             skype: manuel.rodriguez.pascual
>>             phone: (+34) 913466173 // (+34) 679925108
>>
>>             CIEMAT-Moncloa
>>             Edificio 22, desp. 1.25
>>             Avenida Complutense, 40
>>             28040- MADRID
>>             SPAIN
>>
>>
>>             _______________________________________________
>>             Dev mailing list    Dev at ensembl.org
>>             Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>>             Ensembl Blog: http://www.ensembl.info/
>>
>>
>>             --
>>             Dr Emily Pritchard
>>             Ensembl Outreach Officer
>>
>>             European Bioinformatics Institute (EMBL-EBI)
>>             European Molecular Biology Laboratory
>>             Wellcome Trust Genome Campus
>>             Hinxton
>>             Cambridge
>>             CB10 1SD
>>             UK
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> --
>> Dr. Manuel Rodríguez-Pascual
>> skype: manuel.rodriguez.pascual
>> phone: (+34) 913466173 // (+34) 679925108
>>
>> CIEMAT-Moncloa
>> Edificio 22, desp. 1.25
>> Avenida Complutense, 40
>> 28040- MADRID
>> SPAIN
>>
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140115/8d0257a3/attachment.html>


More information about the Dev mailing list