[ensembl-dev] getting sequences from older assemblies using the Perl API

PATERSON Trevor trevor.paterson at roslin.ed.ac.uk
Thu Aug 21 10:48:52 BST 2014


Users wishing to retrieve sequences from assembly versions in previous releases of Ensembl might be interested in using the JEnsembl Java API.

( freely available from http://jensembl.sourceforge.net/ )

One of the useful features of JEnsembl is its backwards compatibility and version awareness.

Code example using the current release 75 of JEnsembl (release 76 is coming soon...)

DBRegistry eReg = DBRegistry.createRegistryForDataSource(DBConnection.DataSource.ENSEMBLDB);
DBSpecies mouse = eReg.getSpeciesByAlias("mouse");

 System.out.println("Release 67 Assembly is " +mouse.getAssemblyName("67"));
DAChromosome chr7v67 = mouse.getChromosomeByName("7", "67");
System.out.println("chr7v67 (123336212-123338053): "+chr7v67.getSequenceAsString(123336212 , 123338053 ));

System.out.println("Release 75 Assembly is " +mouse.getAssemblyName("75"));
DAChromosome chr7v75 = mouse.getChromosomeByName("7", "75");
System.out.println("chr7v75 (123336212-123338053): "+chr7v75.getSequenceAsString(123336212 , 123338053 ));

Outputs

Release 67 Assembly is NCBIM37
chr7v67 (123336212-123338053): TCCATTGAGAAAGGAGACAGGCCTGCAGGGAT ... etc

Release 75 Assembly is GRCm38.p2
chr7v75 (123336212-123338053): AGGTGCACACTCTGTAAGGAAGGAAACCCAGT ... etc

JEnsembl can connect to any Ensembl datasource including those hosted by Ensembl and EnsemblGenomes (invertebrates, plants, bacteria etc...)


Trevor Paterson PhD
trevor.paterson at roslin.ed.ac.uk<mailto:trevor.paterson at roslin.ed.ac.uk>
Bioinformatics
The Roslin Institute
Royal (Dick) School of Veterinary Studies
University of Edinburgh
Easter Bush
Midlothian
EH25 9RG
Scotland UK

phone +44 (0)131 651 9157

http://bioinformatics.roslin.ed.ac.uk/

Please consider the environment before printing this e-mail The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336 Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.

From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Ashley Waardenberg
Sent: 21 August 2014 07:13
To: dev at ensembl.org
Subject: [ensembl-dev] getting sequences from older assemblies using the Perl API

If I select any other assembly version (below as variable "$version") other than the highest ranked, I seem to only get "N" as sequences returned. I have tried this for both mouse and human. This is using the perl API version 75, below is an example querying a sequence at some specific coordinate. If I retrieve the soft-masked region, it returns lower cases where expected, but is still all "n". If I change the assembly to highest rank (e.g. GRCm38)  it retrieves "proper" sequence, but obviously at the wrong coordinates as I am using NCBIM37 fragments.

Is this a possible bug/any ideas?

Ash

#code start

use Bio::EnsEMBL::Registry;



my $registry = "Bio::EnsEMBL::Registry";

my $species = "mouse";

my $version = "NCBIM37";



$registry->load_registry_from_db(

    -host =>  'ensembldb.ensembl.org',

    -user =>  'anonymous'

);



my $adaptor = $registry->get_adaptor($species,'Core','Slice');

#slice of interest:

my $slice = $adaptor->fetch_by_region('chromosome', 7 , 123336212 , 123338053 , 1, $version);

#different elements of the slice:

my $sequence = $slice->seq();

my $softmasked = $slice->get_repeatmasked_seq( undef, 1 );

my $softmasked_seq = $softmasked->seq();



#print sliced sequences:

print $sequence."\n"."\n"."\n";

print $softmasked_seq."\n";
#code end


#this prints the following out:



NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN





NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNnnnnnnnnnnnnnnnnnnnnnnnnnnnNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

______________________________________________________________________
The Victor Chang Cardiac Research Institute's Email is scanned by the MessageLabs Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140821/2911a8f6/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140821/2911a8f6/attachment.ksh>


More information about the Dev mailing list