[ensembl-dev] Problem with DumpAlignedGenes.pl script

Kathryn Beal kbeal at ebi.ac.uk
Fri Jun 15 07:03:19 BST 2012


Hi Alan,
The EPO_LOW_COVERAGE alignments can be downloaded from the following ftp site:

ftp://ftp.ensembl.org/pub/release-67/emf/ensembl-compara/epo_35_eutherian/

Cheers
Kathryn


> Hi Stephen,
> 
> Thanks a lot. The basic_ma.pl you attached worked great for downloading small regions. I need to download genome-wide EPO_LOW_COVERAGE alignments for some analyses we are trying to do on the gibbon genome. I was hoping to be able to use the script to download alignments for each chromosome, but when I tried that with chr22 I got a "DBD::mysql::stexecute failed: Lost connection to MySQL server during query" error. It looks like whole chromosomes cannot easily be download via the API. Is there somewhere that I can download the EPO_LOW_COVERAGE alignments as text files? I googled around and I have not been able to find a place to download them.
>  
> Thanks,
>  
> Alan 
> ________________________________________
> From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] On Behalf Of Stephen Fitzgerald [stephenf at ebi.ac.uk]
> Sent: Thursday, June 14, 2012 9:36 AM
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] Problem with DumpAlignedGenes.pl script
> 
> Hi Alan, I've commited some changes to the DumpAlignedGenes.pl in cvs
> (head) which should make it easier to use for retrieving multiple
> alignments. The --set_of_species flag is used to find alignments
> consisting of that particular set of species (this is why you see the
> exception below as the EPO_LOW_COVERAGE alignments consists of 35 species
> and not just human:rat:mouse).
> I've added a new flag "--species_set_name" which will make it easier to
> get alignments from the EPO_LOW_COVERAGE set.
> 
> eg.
> perl DumpAlignedGenes.pl  --alignment_type EPO_LOW_COVERAGE
> --species_set_name mammals --seq_region 8 --seq_region_start 21180813
> --seq_region_end 21188452 --species rat --genes_from chimp
> 
> I've also attahed a basic script which will allow you to get all the
> alignments from the EPO_LOW_COVERAGE when you define a reference species
> and coordinates (I've hard-coded "sus_scrofa" as the reference).
> 
> You may also find some parts of this script of use:
> http://www.ebi.ac.uk/~stephenf/Workshops/Cambridge_march2012/Exercises/Solutions/align_slice.txt
> 
> (It prints the nucleotide bases and assembly positions for both the pig
> myostatin (MSTN) gene and it's bovine ortholog where the aligned sequences
> differ)
> 
> and the output from it is here:
> http://www.ebi.ac.uk/~stephenf/Workshops/Cambridge_march2012/Exercises/Solutions/align_slice.out.txt
> 
> If you have any more questions, get back in touch.
> 
> All the best,
> Stephen.
> 
> 
> 
> 
> On Wed, 13 Jun 2012, Harris, Ronald Alan wrote:
> 
> > Hi Kathryn and Stephen,
> >
> > Thanks for helping me out with DumpAlignedGenes.pl. The scripts that both of you sent me now work, but there is another problem. According to the help,
> > you can get alignments from multiple species using this:
> >
> > [--set_of_species species1:species2:species3:...]
> >
> > I ran both of your scripts with this command line
> >
> > perl DumpAlignedGenes.pl --set_of_species "human:rat:mouse"
> >
> > but I get this error (line numbers based on the script Kathryn sent):
> >
> > -------------------- WARNING ----------------------
> > MSG: No Bio::EnsEMBL::Compara::MethodLinkSpeciesSet found for
> >   <EPO_LOW_COVERAGE> and homo_sapiens(GRCh37), rattus_norvegicus(RGSC3.4), mus_musculus(NCBIM37)
> > FILE: Compara/DBSQL/MethodLinkSpeciesSetAdaptor.pm LINE: 508
> > CALLED BY: DumpAlignedGenes.kathryn.pl  LINE: 385
> > Ensembl API version = 67
> > ---------------------------------------------------
> > -------------------- EXCEPTION --------------------
> > MSG: The database do not contain any EPO_LOW_COVERAGE data for human:rat:mouse!
> > STACK toplevel DumpAlignedGenes.kathryn.pl:389
> > Ensembl API version = 67
> > ---------------------------------------------------
> >
> > This is after setting the alignment_type to "EPO_LOW_COVERAGE" which is what I want to use, but I get a similar error if I use the default alignment_type
> > of BLASTZ_NET.
> >
> > What I ultimately want to do is pull out multiple alignments for all species in the 35 eutherian mammals EPO_LOW_COVERAGE for a set of genes that I am
> > interested in analyzing. Can I do that with this script?
> >
> > Thanks,
> >
> > Alan
> >
> >
> >
> > _________________________________________________________________________________________________________________________________________________________
> > From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] On Behalf Of Kathryn Beal [kbeal at ebi.ac.uk]
> > Sent: Thursday, May 31, 2012 5:24 AM
> > To: Ensembl developers list
> > Subject: Re: [ensembl-dev] Problem with DumpAlignedGenes.pl script
> >
> > Hi Alan, As you may have noticed, this script has not been updated for some time. I've made various updates and the script now works. I've committed this
> > to the cvs ensembl-compara HEAD code. I'll send a separate email with this attached for you.
> >
> > Cheers
> > Kathryn
> >
> >       Hi,
> >
> > I am trying to use the DumpAlignedGenes.pl in version 67 of the ensembl-compara API. I have made the following necessary changes to the script
> > (attached script has these changes):
> >
> > 1. When running the script without arguments as a test I get this error:
> >
> > ------------------ DEPRECATED ---------------------
> > Deprecated method call in file DumpAlignedGenes.pl line 365.
> > Method Bio::EnsEMBL::DBSQL::MetaContainer::get_Species is deprecated.
> > Call is deprecated. Use $self->get_common_name() / $self->get_classification() / $self->get_scientific_name() instead
> > Ensembl API version = 67
> > ---------------------------------------------------
> > -------------------- EXCEPTION --------------------
> > MSG: No matches found for name 'Homininae Homo sapiens' and assembly '--undef--'
> > STACKBio::EnsEMBL::Compara::DBSQL::GenomeDBAdaptor::fetch_by_name_assembly /home/rharris1/work/ensembl/api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/
> > GenomeDBAdaptor.pm:131
> > STACK toplevel DumpAlignedGenes.pl:369
> > Ensembl API version = 67
> >
> > I changed line 365 from
> >
> > my $this_binomial_id = $this_meta_container_adaptor->get_Species->binomial;
> >
> > to
> >
> > my $this_binomial_id = $this_meta_container_adaptor->get_scientific_name;
> > $this_binomial_id =~ s/\s/_/;
> >
> > which fixes this error.
> >
> > 2. I also get this similar error:
> >
> > ------------------ DEPRECATED ---------------------
> > Deprecated method call in file DumpAlignedGenes.pl line 436.
> > Method Bio::EnsEMBL::DBSQL::MetaContainer::get_Species is deprecated.
> > Call is deprecated. Use $self->get_common_name() / $self->get_classification() / $self->get_scientific_name() instead
> > Ensembl API version = 67
> > ---------------------------------------------------
> > -------------------- EXCEPTION --------------------
> > MSG: No matches found for name 'Murinae Rattus norvegicus' and assembly '--undef--'
> > STACKBio::EnsEMBL::Compara::DBSQL::GenomeDBAdaptor::fetch_by_name_assembly /home/rharris1/work/ensembl/api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/
> > GenomeDBAdaptor.pm:131
> > STACK toplevel DumpAlignedGenes.pl:440
> > Ensembl API version = 67
> > ---------------------------------------------------
> >
> > I changed line 436 from
> >
> > my $source_binomial_id = $meta_container_adaptor->get_scientific_name;
> >
> > to
> >
> > my $source_binomial_id = $meta_container_adaptor->get_scientific_name;
> > $source_binomial_id =~ s/\s/_/;
> >
> > which fixes this error.
> >
> > ---------------------------------------------------------------------
> >
> > Now I am getting this error:
> >
> > Can't call method "get_all_Genes" on unblessed reference at DumpAlignedGenes.pl line 463.
> >
> > Here is line 463:
> >
> > my $mapped_genes = $align_slice->{'slices'}->{$source_genome_db->name}->get_all_Genes(
> >
> > >>>How do I fix this error?<<<
> >
> >
> > Thank you for your time.
> >
> > Alan
> >
> >
> > R. Alan Harris, Ph.D.
> > Assistant Professor
> > Bioinformatics Research Laboratory
> > Epigenomics Data Analysis and Coordination Center
> > Department of Molecular and Human Genetics
> > Baylor College of Medicine
> > Houston, TX 77030
> > 713-798-7695
> > <DumpAlignedGenes.pl>_______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> >
> >
> >
> >
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120615/7f340e7a/attachment.html>


More information about the Dev mailing list