[ensembl-dev] Problem with DumpAlignedGenes.pl script

Stephen Fitzgerald stephenf at ebi.ac.uk
Thu Jun 14 15:36:50 BST 2012


Hi Alan, I've commited some changes to the DumpAlignedGenes.pl in cvs 
(head) which should make it easier to use for retrieving multiple 
alignments. The --set_of_species flag is used to find alignments 
consisting of that particular set of species (this is why you see the 
exception below as the EPO_LOW_COVERAGE alignments consists of 35 species 
and not just human:rat:mouse).
I've added a new flag "--species_set_name" which will make it easier to 
get alignments from the EPO_LOW_COVERAGE set.

eg.
perl DumpAlignedGenes.pl  --alignment_type EPO_LOW_COVERAGE 
--species_set_name mammals --seq_region 8 --seq_region_start 21180813 
--seq_region_end 21188452 --species rat --genes_from chimp

I've also attahed a basic script which will allow you to get all the 
alignments from the EPO_LOW_COVERAGE when you define a reference species 
and coordinates (I've hard-coded "sus_scrofa" as the reference).

You may also find some parts of this script of use:
http://www.ebi.ac.uk/~stephenf/Workshops/Cambridge_march2012/Exercises/Solutions/align_slice.txt

(It prints the nucleotide bases and assembly positions for both the pig 
myostatin (MSTN) gene and it's bovine ortholog where the aligned sequences 
differ)

and the output from it is here:
http://www.ebi.ac.uk/~stephenf/Workshops/Cambridge_march2012/Exercises/Solutions/align_slice.out.txt

If you have any more questions, get back in touch.

All the best,
Stephen.




On Wed, 13 Jun 2012, Harris, Ronald Alan wrote:

> Hi Kathryn and Stephen,
>  
> Thanks for helping me out with DumpAlignedGenes.pl. The scripts that both of you sent me now work, but there is another problem. According to the help,
> you can get alignments from multiple species using this:
>  
> [--set_of_species species1:species2:species3:...]
>  
> I ran both of your scripts with this command line
>  
> perl DumpAlignedGenes.pl --set_of_species "human:rat:mouse"
>  
> but I get this error (line numbers based on the script Kathryn sent):
>  
> -------------------- WARNING ----------------------
> MSG: No Bio::EnsEMBL::Compara::MethodLinkSpeciesSet found for
>   <EPO_LOW_COVERAGE> and homo_sapiens(GRCh37), rattus_norvegicus(RGSC3.4), mus_musculus(NCBIM37)
> FILE: Compara/DBSQL/MethodLinkSpeciesSetAdaptor.pm LINE: 508
> CALLED BY: DumpAlignedGenes.kathryn.pl  LINE: 385
> Ensembl API version = 67
> ---------------------------------------------------
> -------------------- EXCEPTION --------------------
> MSG: The database do not contain any EPO_LOW_COVERAGE data for human:rat:mouse!
> STACK toplevel DumpAlignedGenes.kathryn.pl:389
> Ensembl API version = 67
> ---------------------------------------------------
>  
> This is after setting the alignment_type to "EPO_LOW_COVERAGE" which is what I want to use, but I get a similar error if I use the default alignment_type
> of BLASTZ_NET.
>  
> What I ultimately want to do is pull out multiple alignments for all species in the 35 eutherian mammals EPO_LOW_COVERAGE for a set of genes that I am
> interested in analyzing. Can I do that with this script?
>  
> Thanks,
>  
> Alan
>  
>  
> 
> _________________________________________________________________________________________________________________________________________________________
> From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] On Behalf Of Kathryn Beal [kbeal at ebi.ac.uk]
> Sent: Thursday, May 31, 2012 5:24 AM
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] Problem with DumpAlignedGenes.pl script
> 
> Hi Alan, As you may have noticed, this script has not been updated for some time. I've made various updates and the script now works. I've committed this
> to the cvs ensembl-compara HEAD code. I'll send a separate email with this attached for you.
> 
> Cheers
> Kathryn
>
>       Hi,
>  
> I am trying to use the DumpAlignedGenes.pl in version 67 of the ensembl-compara API. I have made the following necessary changes to the script
> (attached script has these changes):
>  
> 1. When running the script without arguments as a test I get this error:
>  
> ------------------ DEPRECATED ---------------------
> Deprecated method call in file DumpAlignedGenes.pl line 365.
> Method Bio::EnsEMBL::DBSQL::MetaContainer::get_Species is deprecated.
> Call is deprecated. Use $self->get_common_name() / $self->get_classification() / $self->get_scientific_name() instead
> Ensembl API version = 67
> ---------------------------------------------------
> -------------------- EXCEPTION --------------------
> MSG: No matches found for name 'Homininae Homo sapiens' and assembly '--undef--'
> STACKBio::EnsEMBL::Compara::DBSQL::GenomeDBAdaptor::fetch_by_name_assembly /home/rharris1/work/ensembl/api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/
> GenomeDBAdaptor.pm:131
> STACK toplevel DumpAlignedGenes.pl:369
> Ensembl API version = 67
>  
> I changed line 365 from
>  
> my $this_binomial_id = $this_meta_container_adaptor->get_Species->binomial;
>  
> to
>  
> my $this_binomial_id = $this_meta_container_adaptor->get_scientific_name;
> $this_binomial_id =~ s/\s/_/;
>  
> which fixes this error.
>  
> 2. I also get this similar error:
>  
> ------------------ DEPRECATED ---------------------
> Deprecated method call in file DumpAlignedGenes.pl line 436.
> Method Bio::EnsEMBL::DBSQL::MetaContainer::get_Species is deprecated.
> Call is deprecated. Use $self->get_common_name() / $self->get_classification() / $self->get_scientific_name() instead
> Ensembl API version = 67
> ---------------------------------------------------
> -------------------- EXCEPTION --------------------
> MSG: No matches found for name 'Murinae Rattus norvegicus' and assembly '--undef--'
> STACKBio::EnsEMBL::Compara::DBSQL::GenomeDBAdaptor::fetch_by_name_assembly /home/rharris1/work/ensembl/api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/
> GenomeDBAdaptor.pm:131
> STACK toplevel DumpAlignedGenes.pl:440
> Ensembl API version = 67
> ---------------------------------------------------
>  
> I changed line 436 from
>  
> my $source_binomial_id = $meta_container_adaptor->get_scientific_name;
>  
> to
>  
> my $source_binomial_id = $meta_container_adaptor->get_scientific_name;
> $source_binomial_id =~ s/\s/_/;
>  
> which fixes this error.
>  
> ---------------------------------------------------------------------
>  
> Now I am getting this error:
>  
> Can't call method "get_all_Genes" on unblessed reference at DumpAlignedGenes.pl line 463.
>  
> Here is line 463:
>  
> my $mapped_genes = $align_slice->{'slices'}->{$source_genome_db->name}->get_all_Genes(
>  
> >>>How do I fix this error?<<<
>  
>  
> Thank you for your time.
>  
> Alan
>  
>  
> R. Alan Harris, Ph.D.
> Assistant Professor
> Bioinformatics Research Laboratory
> Epigenomics Data Analysis and Coordination Center
> Department of Molecular and Human Genetics
> Baylor College of Medicine
> Houston, TX 77030
> 713-798-7695
> <DumpAlignedGenes.pl>_______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> 
> 
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: basic_ma.pl
Type: application/x-perl
Size: 2058 bytes
Desc: 
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120614/b75d3e45/attachment.pl>


More information about the Dev mailing list