[ensembl-dev] get sequence from different build
mailsvl at fastmail.fm
mailsvl at fastmail.fm
Thu Sep 23 09:58:59 BST 2010
Hi Javier,
What I want is your first example, to get for a region (eg
chr1:1230-1240):
1) the seq of build 36
2) the seq of build 37
But the code below always gives me 'NNNNNN' for the 36 build, try this:
# ======================================================
my $s = 'Human'; # species-name
my $r = 'chromosome'; # slice region
my $c = 11; # chromosome
my $p = 123000000; # position
# ======================================================
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous'
);
my $sa = $registry->get_adaptor( $s, 'Core', 'Slice' );
# ======================================================
my $s36 = $sa->fetch_by_region( $r, $c, $p, $p+20, 1, 'NCBI36' );
my $s37 = $sa->fetch_by_region( $r, $c, $p, $p+20, 1, 'GRCh37' );
print $s36->seq."\n";
print $s37->seq."\n";
/code
Results in:
> NNNNNNNNNNNNNNNNNNNNN
> TGCACTCCAGCCTGGGCAATG
Using version ensembl version 59, 58 or 57 all failed...
-Stef
----- Original message -----
From: "Javier Herrero" <jherrero at ebi.ac.uk>
To: dev at ensembl.org
Date: Thu, 16 Sep 2010 16:27:53 +0100
Subject: Re: [ensembl-dev] get sequence from different build
Hi Stef
You can use
$slice_adaptor->fetch_by_region( 'chromosome', '11', 123000000,
123000020, 1,
'NCBI36' );
$slice_adaptor->fetch_by_region( 'chromosome', '11', 123000000,
123000020, 1,
'GRCh37' );
But probably what you want is to map between coordinates. Here is an
example:
my $slice = $sa->fetch_by_region( "chromosome", 11, 123000000,
123000020, 1,
"NCBI36");
print $slice->name, "\n";
my $proj_segments = $slice->project("chromosome", "GRCh37");
foreach my $proj_segment (@$proj_segments) {
print $proj_segment->to_Slice->name, "\n";
}
And the same, mapping in the other direction.
my $slice = $sa->fetch_by_region( "chromosome", 11, 123000000,
123000020, 1,
"GRCh37");
print $slice->name, "\n";
my $proj_segments = $slice->project("chromosome", "NCBI36");
foreach my $proj_segment (@$proj_segments) {
print $proj_segment->to_Slice->name, "\n";
}
You can read more about this in the perdoc for the
Bio::EnsEMBL::Slice->project method.
Javier
On Thursday 16 Sep 2010 15:00:25 mailsvl at fastmail.fm wrote:
> Ok, so this means you cannot get the sequences of different builds (for
> the same chromosomal coordinates) within one script? Or is this possible
> by changing the db version temporarily (with eg reset_DBAdaptor...)?
>
> -Stef
>
>
> ----- Original message -----
> From: "Jan-hinnerk Vogel" <jhv at sanger.ac.uk>
> To: mailsvl at fastmail.fm
> Cc: "Ensembl DevList" <dev at ensembl.org>
> Date: Thu, 16 Sep 2010 14:30:46 +0100
> Subject: Re: [ensembl-dev] get sequence from different build
>
> Hi Stef,
>
> ooops - it seems the tutorial is wrong, sorry. We'll correct it asap.
>
> What happend is that the 'strand' argument, which should be the 5th
> argument, slipped somehow.
> The assembly version should be the 6th argument, not the 5th. A good
> way to get warned about
> those things is to use perl -w or use 'use warnings' in your script -
> you see something like
>
> "Argument "GRCh37" isn't numeric in integer multiplication (*) at /nfs/
> ensembl/jhv//cvs_checkout//ensembl_live/modules/Bio/EnsEMBL/Mapper.pm
> line 314." so something is wrong ...
>
>
> http://www.ensembl.org/info/docs/Pdoc/ensembl/modules/Bio/EnsEMBL/DBSQL/Sli
> ceAdaptor.html#POD15
>
>
> You currently can't fetch sequence data from a different assembly
> version from the same database - only the mapping is stored,
> not the sequence of older assemblies. If you try it, you should get a
> string of NNNNN's back.
>
> To fetch sequence for GRCh37 use the latest human core :
> homo_sapiens_core_59_37d
> to fetch seq for NCBI36 you have to use an older database and connect
> with an older API checkout - ie homo_sapiens_core_***_36*
>
>
> Hth,
> Jan-Hinnerk Vogel
>
>
>
> ==
> Ensembl Genebuild Team Project Leader
> www.ensembl.org
>
> On 16 Sep 2010, at 12:54, mailsvl at fastmail.fm wrote:
> > Hi,
> >
> > How can I get a slice from a different build then the default? When I
> > try to compare GCRh37 and NCBI36 I keep getting the same sequences
> > (from
> > GCRh37), no matter what region I try.
>
> I assume you mean i "GRCh37" ?
>
> > The code I use is from:
> > http://www.ensembl.org/info/docs/api/core/core_tutorial.html
> >
> > Code:
> > - $slice_adaptor->fetch_by_region( 'chromosome', '11', 123000000,
> > 123000020, 'NCBI36' ); or
> > - $slice_adaptor->fetch_by_region( 'chromosome', '11', 123000000,
> > 123000020, 1, 'NCBI36' );
> >
> > Results when using ensembl website:
> >> 11 dna:chromosome chromosome:NCBI36:11:123000000:123000020:1
> >
> > AAGAGTCCCACCAGCTCCAGG
> >
> >> 11 dna:chromosome chromosome:GRCh37:11:123000000:123000020:1
> >
> > TGCACTCCAGCCTGGGCAATG
> >
> > Thanks,
> > Stef
> > __________________
> > http://www.fastmail.fm
> >
> >
> > _______________________________________________
> > Dev mailing list
> > Dev at ensembl.org
> > http://lists.ensembl.org/mailman/listinfo/dev
>
> __________________
> http://www.fastmail.fm
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
--
Javier Herrero, PhD
Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK
_______________________________________________
Dev mailing list
Dev at ensembl.org
http://lists.ensembl.org/mailman/listinfo/dev
__________________
http://www.fastmail.fm
More information about the Dev
mailing list