[ensembl-dev] Problem with DumpAlignedGenes.pl script

Harris, Ronald Alan rharris1 at bcm.edu
Fri Jun 15 06:28:08 BST 2012


Hi Stephen,

Thanks a lot. The basic_ma.pl you attached worked great for downloading small regions. I need to download genome-wide EPO_LOW_COVERAGE alignments for some analyses we are trying to do on the gibbon genome. I was hoping to be able to use the script to download alignments for each chromosome, but when I tried that with chr22 I got a "DBD::mysql::st execute failed: Lost connection to MySQL server during query" error. It looks like whole chromosomes cannot easily be download via the API. Is there somewhere that I can download the EPO_LOW_COVERAGE alignments as text files? I googled around and I have not been able to find a place to download them.



Thanks,



Alan
________________________________________
From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] On Behalf Of Stephen Fitzgerald [stephenf at ebi.ac.uk]
Sent: Thursday, June 14, 2012 9:36 AM
To: Ensembl developers list
Subject: Re: [ensembl-dev] Problem with DumpAlignedGenes.pl script

Hi Alan, I've commited some changes to the DumpAlignedGenes.pl in cvs
(head) which should make it easier to use for retrieving multiple
alignments. The --set_of_species flag is used to find alignments
consisting of that particular set of species (this is why you see the
exception below as the EPO_LOW_COVERAGE alignments consists of 35 species
and not just human:rat:mouse).
I've added a new flag "--species_set_name" which will make it easier to
get alignments from the EPO_LOW_COVERAGE set.

eg.
perl DumpAlignedGenes.pl  --alignment_type EPO_LOW_COVERAGE
--species_set_name mammals --seq_region 8 --seq_region_start 21180813
--seq_region_end 21188452 --species rat --genes_from chimp

I've also attahed a basic script which will allow you to get all the
alignments from the EPO_LOW_COVERAGE when you define a reference species
and coordinates (I've hard-coded "sus_scrofa" as the reference).

You may also find some parts of this script of use:
http://www.ebi.ac.uk/~stephenf/Workshops/Cambridge_march2012/Exercises/Solutions/align_slice.txt

(It prints the nucleotide bases and assembly positions for both the pig
myostatin (MSTN) gene and it's bovine ortholog where the aligned sequences
differ)

and the output from it is here:
http://www.ebi.ac.uk/~stephenf/Workshops/Cambridge_march2012/Exercises/Solutions/align_slice.out.txt

If you have any more questions, get back in touch.

All the best,
Stephen.




On Wed, 13 Jun 2012, Harris, Ronald Alan wrote:

> Hi Kathryn and Stephen,
>
> Thanks for helping me out with DumpAlignedGenes.pl. The scripts that both of you sent me now work, but there is another problem. According to the help,
> you can get alignments from multiple species using this:
>
> [--set_of_species species1:species2:species3:...]
>
> I ran both of your scripts with this command line
>
> perl DumpAlignedGenes.pl --set_of_species "human:rat:mouse"
>
> but I get this error (line numbers based on the script Kathryn sent):
>
> -------------------- WARNING ----------------------
> MSG: No Bio::EnsEMBL::Compara::MethodLinkSpeciesSet found for
>   <EPO_LOW_COVERAGE> and homo_sapiens(GRCh37), rattus_norvegicus(RGSC3.4), mus_musculus(NCBIM37)
> FILE: Compara/DBSQL/MethodLinkSpeciesSetAdaptor.pm LINE: 508
> CALLED BY: DumpAlignedGenes.kathryn.pl  LINE: 385
> Ensembl API version = 67
> ---------------------------------------------------
> -------------------- EXCEPTION --------------------
> MSG: The database do not contain any EPO_LOW_COVERAGE data for human:rat:mouse!
> STACK toplevel DumpAlignedGenes.kathryn.pl:389
> Ensembl API version = 67
> ---------------------------------------------------
>
> This is after setting the alignment_type to "EPO_LOW_COVERAGE" which is what I want to use, but I get a similar error if I use the default alignment_type
> of BLASTZ_NET.
>
> What I ultimately want to do is pull out multiple alignments for all species in the 35 eutherian mammals EPO_LOW_COVERAGE for a set of genes that I am
> interested in analyzing. Can I do that with this script?
>
> Thanks,
>
> Alan
>
>
>
> _________________________________________________________________________________________________________________________________________________________
> From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] On Behalf Of Kathryn Beal [kbeal at ebi.ac.uk]
> Sent: Thursday, May 31, 2012 5:24 AM
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] Problem with DumpAlignedGenes.pl script
>
> Hi Alan, As you may have noticed, this script has not been updated for some time. I've made various updates and the script now works. I've committed this
> to the cvs ensembl-compara HEAD code. I'll send a separate email with this attached for you.
>
> Cheers
> Kathryn
>
>       Hi,
>
> I am trying to use the DumpAlignedGenes.pl in version 67 of the ensembl-compara API. I have made the following necessary changes to the script
> (attached script has these changes):
>
> 1. When running the script without arguments as a test I get this error:
>
> ------------------ DEPRECATED ---------------------
> Deprecated method call in file DumpAlignedGenes.pl line 365.
> Method Bio::EnsEMBL::DBSQL::MetaContainer::get_Species is deprecated.
> Call is deprecated. Use $self->get_common_name() / $self->get_classification() / $self->get_scientific_name() instead
> Ensembl API version = 67
> ---------------------------------------------------
> -------------------- EXCEPTION --------------------
> MSG: No matches found for name 'Homininae Homo sapiens' and assembly '--undef--'
> STACKBio::EnsEMBL::Compara::DBSQL::GenomeDBAdaptor::fetch_by_name_assembly /home/rharris1/work/ensembl/api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/
> GenomeDBAdaptor.pm:131
> STACK toplevel DumpAlignedGenes.pl:369
> Ensembl API version = 67
>
> I changed line 365 from
>
> my $this_binomial_id = $this_meta_container_adaptor->get_Species->binomial;
>
> to
>
> my $this_binomial_id = $this_meta_container_adaptor->get_scientific_name;
> $this_binomial_id =~ s/\s/_/;
>
> which fixes this error.
>
> 2. I also get this similar error:
>
> ------------------ DEPRECATED ---------------------
> Deprecated method call in file DumpAlignedGenes.pl line 436.
> Method Bio::EnsEMBL::DBSQL::MetaContainer::get_Species is deprecated.
> Call is deprecated. Use $self->get_common_name() / $self->get_classification() / $self->get_scientific_name() instead
> Ensembl API version = 67
> ---------------------------------------------------
> -------------------- EXCEPTION --------------------
> MSG: No matches found for name 'Murinae Rattus norvegicus' and assembly '--undef--'
> STACKBio::EnsEMBL::Compara::DBSQL::GenomeDBAdaptor::fetch_by_name_assembly /home/rharris1/work/ensembl/api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/
> GenomeDBAdaptor.pm:131
> STACK toplevel DumpAlignedGenes.pl:440
> Ensembl API version = 67
> ---------------------------------------------------
>
> I changed line 436 from
>
> my $source_binomial_id = $meta_container_adaptor->get_scientific_name;
>
> to
>
> my $source_binomial_id = $meta_container_adaptor->get_scientific_name;
> $source_binomial_id =~ s/\s/_/;
>
> which fixes this error.
>
> ---------------------------------------------------------------------
>
> Now I am getting this error:
>
> Can't call method "get_all_Genes" on unblessed reference at DumpAlignedGenes.pl line 463.
>
> Here is line 463:
>
> my $mapped_genes = $align_slice->{'slices'}->{$source_genome_db->name}->get_all_Genes(
>
> >>>How do I fix this error?<<<
>
>
> Thank you for your time.
>
> Alan
>
>
> R. Alan Harris, Ph.D.
> Assistant Professor
> Bioinformatics Research Laboratory
> Epigenomics Data Analysis and Coordination Center
> Department of Molecular and Human Genetics
> Baylor College of Medicine
> Houston, TX 77030
> 713-798-7695
> <DumpAlignedGenes.pl>_______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120615/0de1eccf/attachment.html>


More information about the Dev mailing list