[ensembl-dev] newbie with Ensembl API- alignments
Manuel Rodríguez Pascual
manuel.rodriguez.pascual at gmail.com
Wed Jan 15 11:42:40 GMT 2014
Dear Stephen,
You were absolutely right. when retrieving just a single block (my
($ref_start, $ref_end) = (7247302, 7267366), for example) phyml works
seamlessly.
I however feel that this is just a workaround. As the web example also
works with phyml and is composed by more than one block, I tend to think
that the problem is somewhere else (maybe in the alignment you mentioned?).
Anyway, finding this issue is out of the scope of my work right now: I am
focused on testing this as a proof of concept of the technology, that has
succeeded thanks to your help. I only hope that if anyone else tries to
perform the same operation finds this useful as a starting point.
Thanks again for your fast and reliable support. Best regards,
Manuel
2014/1/14 Stephen Fitzgerald <stephenf at ebi.ac.uk>
> Hi Manuel, yes, there is a difference between the data retrieved by the
> web and the data retrieved by the API code.
> Because the region (chr19:7112266-7294045) is quite large, it is actually
> composed of 6 primate alignment blocks:
> +-------------+-----------+
> | block_start | block_end |
> +-------------+-----------+
> | 7058566 | 7122884 |
> | 7122885 | 7192725 |
> | 7192726 | 7211509 |
> | 7211510 | 7226908 |
> | 7226909 | 7247301 |
> | 7247302 | 7267366 |
> +-------------+-----------+
>
> In this alignment:
> http://www.ensembl.org/Homo_sapiens/Gene/Compara_
> Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
> The web-code retrieves the full alignment for the last 5 blocks and a
> truncated (restricted) region from the first block (7112266-7122884).
> However, the web-code will connect the blocks with unaligned
> reference sequence (human in this case) if there are any gaps
> between the blocks (in this region there are no gaps between the blocks)
> and pad out any upstream or downstream regions (where there are no
> alignment blocks) with reference sequence.
> In this case the downstream (on the +ve strand) 26679 bp in the
> 7112266-7294045 region has no primate alignments (so, you get unaligned
> human sequence for this region).
>
> In contrast, the script will only retrieve the primate alignment blocks
> associated with the region (no extra reference padding sequence will be
> added) to connect or extend the blocks. In the above example, if you
> output the results of the script to a file, you will get six blocks in the
> one file. I have not run phyml/phylip, but it may expect only one block of
> alignment per file.
>
> Hope this helps.
> Stephen.
>
>
>
>
> On Tue, 14 Jan 2014, Manuel Rodríguez Pascual wrote:
>
> Hi all,
>> Thanks for your valuable suggestions. however, I still have some minor
>> issues.
>>
>> I understand that in Stephen's code, in order to obtain results
>> equivalent to the ones in the web,
>>
>> http://www.ensembl.org/Homo_sapiens/Gene/Compara_
>> Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
>>
>> variables ($ref_start, $ref_end) should be asigned to (7112266,
>> 7294045), as my desired gene region is "Chromosome 19:
>> 7,112,266-7,294,045 reverse strand". I have called
>> $ref_slice_adaptor->fetch_by_region with strand 1 and -1.
>>
>> However, the results I get are not the same than in your web example.
>> Also, when executing PHYML on the results, it suceeds on the web example
>> but fails on the one obtained with Stephen's code. I don't know wheter it
>> is a minor format issue or a deeper problem. Have you got any
>> experience on running phyml/phylip on the fetched results?
>>
>>
>> Thank you very much for your help. Best regards,
>>
>>
>> Manuel
>>
>>
>>
>>
>>
>>
>> 2014/1/13 Stephen Fitzgerald <stephenf at ebi.ac.uk>
>> Hi Manuel, you can also try our REST service, which will give you
>> the option of accessing the data (output in JSON format) in a
>> number of ways:
>> http://beta.rest.ensembl.org/documentation/info/genomic_
>> alignment_block_region
>> for example (using human coords 19:7184302-7184344):
>> bash> curl 'http://beta.rest.ensembl.org/
>> alignment/block/region/homo_sapiens/19:7184302-7184344:1?
>> species_set_group=primates' -H
>> 'Content-type:application/json'
>>
>> Also, if you wish to use the perl API, here is a script for
>> retrieving the alignments in phylip format:
>> bash> perl script.pl > out.phylip
>>
>> ################ script.pl
>>
>> use strict;
>> use warnings;
>> use Data::Dumper;
>> use Bio::AlignIO;
>>
>> use Bio::EnsEMBL::Registry;
>>
>> #Auto-configure the registry
>> Bio::EnsEMBL::Registry->load_registry_from_db(
>> -host=>"ensembldb.ensembl.org", -user=>"anonymous",
>> -port=>'5306', db_version => 74);
>>
>>
>> # Get the Compara Adaptor for MethodLinkSpeciesSets
>> my $method_link_species_set_adaptor =
>> Bio::EnsEMBL::Registry->get_adaptor(
>> "Multi", "compara", "MethodLinkSpeciesSet");
>>
>> my $methodLinkSpeciesSet = $method_link_species_set_adaptor->
>> fetch_by_method_link_type_species_set_name("EPO",
>> "primates");
>>
>> # Define the start and end positions for the alignment
>> my ($ref_start, $ref_end) = (7184302, 7184344);
>>
>> # Get the reference species *core* Adaptor for Slices
>> my $ref_slice_adaptor =
>> Bio::EnsEMBL::Registry->get_adaptor(
>> "homo_sapiens", "core", "Slice");
>>
>> # Get the slice corresponding to the region of interest
>> my $ref_slice = $ref_slice_adaptor->fetch_by_region(
>> "chromosome", 19, $ref_start, $ref_end);
>>
>> # Get the Compara Adaptor for GenomicAlignBlocks
>> my $genomic_align_block_adaptor =
>> Bio::EnsEMBL::Registry->get_adaptor(
>> "Multi", "compara", "GenomicAlignBlock");
>>
>> # The fetch_all_by_MethodLinkSpeciesSet_Slice() returns a ref.
>> # to an array of GenomicAlingBlock objects (human is the reference
>> species)
>> my $all_genomic_align_blocks = $genomic_align_block_adaptor->
>> fetch_all_by_MethodLinkSpeciesSet_Slice(
>> $methodLinkSpeciesSet, $ref_slice);
>>
>> # set up an AlignIO to format SimpleAlign output
>> my $alignIO = Bio::AlignIO->newFh(-interleaved => 0,
>> -fh => \*STDOUT,
>> -format => 'phylip',
>> -idlength => 30);
>>
>> # print the restricted alignments
>> foreach my $genomic_align_block( @{ $all_genomic_align_blocks }) {
>> my $restricted_gab = $genomic_align_block->
>> restrict_between_reference_positions($ref_start, $ref_end);
>> print $alignIO $restricted_gab->get_SimpleAlign;
>> }
>>
>> ################ out.phylip
>> 6 44
>> homo_sapiens/19 cccc-agacccacatccagaactcacttgctggaa
>> ttcatcgtg
>> pongo_abelii/19 cccc-agacctacatccagaactcacttgctggaa
>> ttcatcgtg
>> pan_troglodytes/19 cccc-agacccacaaccagaactcacttgctggaa
>> ttcatcgtg
>> gorilla_gorilla/19 cccc-agacccacatctagaactcacttgctggaa
>> ttcatcgtg
>> callithrix_jacchus/22 ccccaagacccacatgcaggactcacttgc
>> tggaattcatcgtg
>> macaca_mulatta/19 cccc-ggacccacatacagaactcacttgctggaa
>> ttcatcgtg
>>
>>
>> Cheers,
>> Stephen.
>>
>>
>>
>> On Mon, 13 Jan 2014, Emily Pritchard wrote:
>>
>> Hi Manuel
>>
>> Have you seen our API course on EBI Train Online:
>> http://www.ebi.ac.uk/training/online/course/ensembl-filmed-
>> api-workshop
>>
>> The Core section will introduce you to the API itself, and
>> the Compara module for alignments.
>>
>> Hope this helps
>>
>> Emily
>>
>> On 13/01/2014 13:32, Manuel Rodríguez Pascual wrote:
>> I am just starting working with Ensembl API. I am not
>> really experienced neither with Ensembl or bioinformatics
>> itself,so I am stuck
>> in a problem that seems to be easy to solve.
>>
>>
>> As a test and proof of concept, I am interested to retrieve
>> information of a comparison in phylip format. In particular,
>> the provided
>> example
>> http://www.ensembl.org/Homo_sapiens/Gene/Compara_
>> Alignments?align=548&db=core&g=ENSG00000171105&r=19%3A7112266-7294045
>>
>> when exported into a phylip format,
>>
>> http://www.ensembl.org/Homo_sapiens/Export/Output/
>> Location/Alignment?align=548;db=core;g=ENSG00000171105;
>> output=alignment;r=19:7112266-7294045;
>> format=phylip;_format=Text
>>
>> I have however seen that the employment of wget is
>> discouraged and the Ensembl API should be used instead.
>>
>> Reading the available documentation, I fell that I should use
>> the Compara API Tutorial,
>> http://www.ensembl.org/info/docs/api/compara/compara_
>> tutorial.html
>>
>> but I don't really understand it or how to apply it to my
>> objective.
>>
>> The question is, can anyone orient me on how to proceed?
>>
>>
>>
>> Thanks for your attention,
>>
>>
>>
>> Manuel
>>
>>
>> --
>> Dr. Manuel Rodríguez-Pascual
>> skype: manuel.rodriguez.pascual
>> phone: (+34) 913466173 // (+34) 679925108
>>
>> CIEMAT-Moncloa
>> Edificio 22, desp. 1.25
>> Avenida Complutense, 40
>> 28040- MADRID
>> SPAIN
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> --
>> Dr Emily Pritchard
>> Ensembl Outreach Officer
>>
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge
>> CB10 1SD
>> UK
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> --
>> Dr. Manuel Rodríguez-Pascual
>> skype: manuel.rodriguez.pascual
>> phone: (+34) 913466173 // (+34) 679925108
>>
>> CIEMAT-Moncloa
>> Edificio 22, desp. 1.25
>> Avenida Complutense, 40
>> 28040- MADRID
>> SPAIN
>>
>>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140115/8d0257a3/attachment.html>
More information about the Dev
mailing list