[ensembl-dev] newbie with Ensembl API- alignments

Stephen Fitzgerald stephenf at ebi.ac.uk
Tue Jan 14 14:29:29 GMT 2014


Hi Manuel, yes, there is a difference between the data retrieved by the
web and the data retrieved by the API code.
Because the region (chr19:7112266-7294045) is quite large, it is actually
composed of 6 primate alignment blocks:
+-------------+-----------+
| block_start | block_end |
+-------------+-----------+
|     7058566 |   7122884 |
|     7122885 |   7192725 |
|     7192726 |   7211509 |
|     7211510 |   7226908 |
|     7226909 |   7247301 |
|     7247302 |   7267366 |
+-------------+-----------+

In this alignment:
http://www.ensembl.org/Homo_sapiens/Gene/Compara_Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
The web-code retrieves the full alignment for the last 5 blocks and a
truncated (restricted) region from the first block (7112266-7122884).
However, the web-code will connect the blocks with unaligned
reference sequence (human in this case) if there are any gaps
between the blocks (in this region there are no gaps between the blocks)
and pad out any upstream or downstream regions (where there are no
alignment blocks) with reference sequence.
In this case the downstream (on the +ve strand) 26679 bp in the
7112266-7294045 region has no primate alignments (so, you get unaligned 
human sequence for this region).

In contrast, the script will only retrieve the primate alignment blocks
associated with the region (no extra reference padding sequence will be
added) to connect or extend the blocks. In the above example, if you
output the results of the script to a file, you will get six blocks in the
one file. I have not run phyml/phylip, but it may expect only one block of
alignment per file.

Hope this helps.
Stephen.



On Tue, 14 Jan 2014, Manuel Rodríguez Pascual wrote:

> Hi all, 
> Thanks for your valuable suggestions. however, I still have some minor issues.
> 
> I understand that in Stephen's code, in order to obtain results equivalent to the ones in the web, 
> 
> http://www.ensembl.org/Homo_sapiens/Gene/Compara_Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
> 
> variables ($ref_start, $ref_end) should be asigned to  (7112266, 7294045), as my desired gene region is "Chromosome 19:
> 7,112,266-7,294,045 reverse strand". I have called  $ref_slice_adaptor->fetch_by_region with strand 1 and -1.
> 
> However, the results I get are not the same than in your web example. Also, when executing PHYML on the results, it suceeds on the web example
> but fails on the one obtained with Stephen's code. I don't know wheter it is a minor format issue or a deeper problem. Have you got any
> experience on running phyml/phylip on the fetched results?
> 
> 
> Thank you very much for your help. Best regards,
> 
> 
> Manuel
> 
> 
> 
> 
> 
> 
> 2014/1/13 Stephen Fitzgerald <stephenf at ebi.ac.uk>
>       Hi Manuel, you can also try our REST service, which will give you the option of accessing the data (output in JSON format) in a
>       number of ways:
>       http://beta.rest.ensembl.org/documentation/info/genomic_alignment_block_region
>       for example (using human coords 19:7184302-7184344):
>       bash> curl 'http://beta.rest.ensembl.org/alignment/block/region/homo_sapiens/19:7184302-7184344:1?species_set_group=primates' -H
>       'Content-type:application/json'
>
>       Also, if you wish to use the perl API, here is a script for retrieving the alignments in phylip format:
>       bash> perl script.pl > out.phylip
>
>       ################ script.pl
>
>       use strict;
>       use warnings;
>       use Data::Dumper;
>       use Bio::AlignIO;
>
>       use Bio::EnsEMBL::Registry;
>
>       #Auto-configure the registry
>       Bio::EnsEMBL::Registry->load_registry_from_db(
>               -host=>"ensembldb.ensembl.org", -user=>"anonymous",
>               -port=>'5306', db_version => 74);
> 
>
>       # Get the Compara Adaptor for MethodLinkSpeciesSets
>       my $method_link_species_set_adaptor =
>           Bio::EnsEMBL::Registry->get_adaptor(
>             "Multi", "compara", "MethodLinkSpeciesSet");
>
>       my $methodLinkSpeciesSet = $method_link_species_set_adaptor->
>               fetch_by_method_link_type_species_set_name("EPO", "primates");
>
>       # Define the start and end positions for the alignment
>       my ($ref_start, $ref_end) = (7184302, 7184344);
>
>       # Get the reference species *core* Adaptor for Slices
>       my $ref_slice_adaptor =
>           Bio::EnsEMBL::Registry->get_adaptor(
>             "homo_sapiens", "core", "Slice");
>
>       # Get the slice corresponding to the region of interest
>       my $ref_slice = $ref_slice_adaptor->fetch_by_region(
>           "chromosome", 19, $ref_start, $ref_end);
>
>       # Get the Compara Adaptor for GenomicAlignBlocks
>       my $genomic_align_block_adaptor =
>           Bio::EnsEMBL::Registry->get_adaptor(
>             "Multi", "compara", "GenomicAlignBlock");
>
>       # The fetch_all_by_MethodLinkSpeciesSet_Slice() returns a ref.
>       # to an array of GenomicAlingBlock objects (human is the reference species)
>       my $all_genomic_align_blocks = $genomic_align_block_adaptor->
>           fetch_all_by_MethodLinkSpeciesSet_Slice(
>               $methodLinkSpeciesSet, $ref_slice);
>
>       # set up an AlignIO to format SimpleAlign output
>       my $alignIO = Bio::AlignIO->newFh(-interleaved => 0,
>                                         -fh => \*STDOUT,
>                                         -format => 'phylip',
>                                         -idlength => 30);
>
>       # print the restricted alignments
>       foreach my $genomic_align_block( @{ $all_genomic_align_blocks }) {
>               my $restricted_gab = $genomic_align_block->restrict_between_reference_positions($ref_start, $ref_end);
>               print $alignIO $restricted_gab->get_SimpleAlign;
>       }
>
>       ################ out.phylip
>        6 44
>       homo_sapiens/19               cccc-agacccacatccagaactcacttgctggaattcatcgtg
>       pongo_abelii/19               cccc-agacctacatccagaactcacttgctggaattcatcgtg
>       pan_troglodytes/19            cccc-agacccacaaccagaactcacttgctggaattcatcgtg
>       gorilla_gorilla/19            cccc-agacccacatctagaactcacttgctggaattcatcgtg
>       callithrix_jacchus/22         ccccaagacccacatgcaggactcacttgctggaattcatcgtg
>       macaca_mulatta/19             cccc-ggacccacatacagaactcacttgctggaattcatcgtg
> 
>
>       Cheers,
>       Stephen.
> 
> 
>
>       On Mon, 13 Jan 2014, Emily Pritchard wrote:
>
>             Hi Manuel
>
>             Have you seen our API course on EBI Train Online:
>             http://www.ebi.ac.uk/training/online/course/ensembl-filmed-api-workshop
>
>             The Core section will introduce you to the API itself, and the Compara module for alignments.
>
>             Hope this helps
>
>             Emily
>
>             On 13/01/2014 13:32, Manuel Rodríguez Pascual wrote:
>                   I am just starting working with Ensembl API. I am not really experienced neither with Ensembl or bioinformatics
>             itself,so I am stuck
>                   in a problem that seems to be easy to solve.
> 
>
>             As a test and proof of concept, I am interested to retrieve information of a comparison in phylip format. In particular,
>             the provided
>             example
>             http://www.ensembl.org/Homo_sapiens/Gene/Compara_Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
>
>             when exported into a phylip format, 
>
>             http://www.ensembl.org/Homo_sapiens/Export/Output/Location/Alignment?align=548;db=core;g=ENSG00000171105;output=alignment;r=19:7112266-7294045;
>             format=phylip;_format=Text
>
>             I have however seen that the employment of wget is discouraged and the Ensembl API should be used instead.
>
>             Reading the available documentation, I fell that I should use the Compara API Tutorial, 
>             http://www.ensembl.org/info/docs/api/compara/compara_tutorial.html
>
>             but I don't really understand it or how to apply it to my objective.
>
>             The question is, can anyone orient me on how to proceed? 
> 
> 
>
>             Thanks for your attention,
> 
> 
>
>             Manuel 
> 
>
>             -- 
>             Dr. Manuel Rodríguez-Pascual
>             skype: manuel.rodriguez.pascual
>             phone: (+34) 913466173 // (+34) 679925108
>              
>             CIEMAT-Moncloa
>             Edificio 22, desp. 1.25
>             Avenida Complutense, 40 
>             28040- MADRID
>             SPAIN
> 
>
>             _______________________________________________
>             Dev mailing list    Dev at ensembl.org
>             Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>             Ensembl Blog: http://www.ensembl.info/
> 
>
>             --
>             Dr Emily Pritchard
>             Ensembl Outreach Officer
>
>             European Bioinformatics Institute (EMBL-EBI)
>             European Molecular Biology Laboratory
>             Wellcome Trust Genome Campus
>             Hinxton
>             Cambridge
>             CB10 1SD
>             UK
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> 
> 
> --
> Dr. Manuel Rodríguez-Pascual
> skype: manuel.rodriguez.pascual
> phone: (+34) 913466173 // (+34) 679925108
>  
> CIEMAT-Moncloa
> Edificio 22, desp. 1.25
> Avenida Complutense, 40 
> 28040- MADRID
> SPAIN
> 
>


More information about the Dev mailing list