[ensembl-dev] get sequence from different build

ian Longden ianl at ebi.ac.uk
Fri Sep 24 15:06:38 BST 2010


The mapper we use to go from one coordinate system to another can only
manage a few steps. We will look into getting a better warning for
this when it is too far but in the mean time you can get the sequence
by projecting the slice first (reducing the number of steps needed
later) :-

If you add the following it should now work.

my $chr_projection = $s36->project('Chromosome','GRCh37');
my $seq = "";
foreach my $segment (@$chr_projection) {
    my ($start, $end, $chr) = @$segment;
    $seq .= $chr->seq;
}

print $seq."\n";



HTH.
-Ian.

On Thu, Sep 23, 2010 at 9:58 AM,  <mailsvl at fastmail.fm> wrote:
> Hi Javier,
>
> What I want is your first example, to get for a region (eg
> chr1:1230-1240):
> 1) the seq of build 36
> 2) the seq of build 37
>
> But the code below always gives me 'NNNNNN' for the 36 build, try this:
>
> # ======================================================
> my $s = 'Human';      # species-name
> my $r = 'chromosome'; # slice region
> my $c = 11;           # chromosome
> my $p = 123000000;    # position
>
> # ======================================================
> my $registry = 'Bio::EnsEMBL::Registry';
> $registry->load_registry_from_db(
>  -host => 'ensembldb.ensembl.org',
>  -user => 'anonymous'
> );
> my $sa = $registry->get_adaptor( $s, 'Core', 'Slice' );
>
> # ======================================================
> my $s36 = $sa->fetch_by_region( $r, $c, $p, $p+20, 1, 'NCBI36' );
> my $s37 = $sa->fetch_by_region( $r, $c, $p, $p+20, 1, 'GRCh37' );
> print $s36->seq."\n";
> print $s37->seq."\n";
>
> /code
>
> Results in:
>> NNNNNNNNNNNNNNNNNNNNN
>> TGCACTCCAGCCTGGGCAATG
>
> Using version ensembl version 59, 58 or 57 all failed...
>
> -Stef
>
> ----- Original message -----
> From: "Javier Herrero" <jherrero at ebi.ac.uk>
> To: dev at ensembl.org
> Date: Thu, 16 Sep 2010 16:27:53 +0100
> Subject: Re: [ensembl-dev] get sequence from different build
>
> Hi Stef
>
> You can use
> $slice_adaptor->fetch_by_region( 'chromosome', '11', 123000000,
> 123000020, 1,
> 'NCBI36' );
>
> $slice_adaptor->fetch_by_region( 'chromosome', '11', 123000000,
> 123000020, 1,
> 'GRCh37' );
>
>
> But probably what you want is to map between coordinates. Here is an
> example:
>
> my $slice = $sa->fetch_by_region( "chromosome", 11, 123000000,
> 123000020, 1,
> "NCBI36");
> print $slice->name, "\n";
> my $proj_segments = $slice->project("chromosome", "GRCh37");
> foreach my $proj_segment (@$proj_segments) {
>  print $proj_segment->to_Slice->name, "\n";
> }
>
> And the same, mapping in the other direction.
>
> my $slice = $sa->fetch_by_region( "chromosome", 11, 123000000,
> 123000020, 1,
> "GRCh37");
> print $slice->name, "\n";
> my $proj_segments = $slice->project("chromosome", "NCBI36");
> foreach my $proj_segment (@$proj_segments) {
>  print $proj_segment->to_Slice->name, "\n";
> }
>
> You can read more about this in the perdoc for the
> Bio::EnsEMBL::Slice->project method.
>
> Javier
>
> On Thursday 16 Sep 2010 15:00:25 mailsvl at fastmail.fm wrote:
>> Ok, so this means you cannot get the sequences of different builds (for
>> the same chromosomal coordinates) within one script? Or is this possible
>> by changing the db version temporarily (with eg reset_DBAdaptor...)?
>>
>> -Stef
>>
>>
>> ----- Original message -----
>> From: "Jan-hinnerk Vogel" <jhv at sanger.ac.uk>
>> To: mailsvl at fastmail.fm
>> Cc: "Ensembl DevList" <dev at ensembl.org>
>> Date: Thu, 16 Sep 2010 14:30:46 +0100
>> Subject: Re: [ensembl-dev] get sequence from different build
>>
>> Hi Stef,
>>
>> ooops - it seems the tutorial is wrong, sorry. We'll correct it asap.
>>
>> What happend is that the 'strand' argument, which should be the 5th
>> argument, slipped somehow.
>> The assembly version should be the 6th argument, not the 5th. A good
>> way to get warned about
>> those things is to  use perl -w or use 'use warnings' in your script -
>> you see something like
>>
>> "Argument "GRCh37" isn't numeric in integer multiplication (*) at /nfs/
>> ensembl/jhv//cvs_checkout//ensembl_live/modules/Bio/EnsEMBL/Mapper.pm
>> line 314." so something is wrong ...
>>
>>
>> http://www.ensembl.org/info/docs/Pdoc/ensembl/modules/Bio/EnsEMBL/DBSQL/Sli
>> ceAdaptor.html#POD15
>>
>>
>> You currently can't fetch sequence data from a different assembly
>> version from the same database  - only the mapping is stored,
>> not the sequence of older assemblies. If you try it, you should get a
>> string of NNNNN's back.
>>
>> To fetch sequence for GRCh37 use the latest human core :
>> homo_sapiens_core_59_37d
>> to fetch seq for NCBI36 you have to use an older database and connect
>> with an older API checkout - ie homo_sapiens_core_***_36*
>>
>>
>> Hth,
>>             Jan-Hinnerk Vogel
>>
>>
>>
>> ==
>> Ensembl Genebuild Team Project Leader
>> www.ensembl.org
>>
>> On 16 Sep 2010, at 12:54, mailsvl at fastmail.fm wrote:
>> > Hi,
>> >
>> > How can I get a slice from a different build then the default? When I
>> > try to compare GCRh37 and NCBI36 I keep getting the same sequences
>> > (from
>> > GCRh37), no matter what region I try.
>>
>> I assume you mean i "GRCh37" ?
>>
>> > The code I use is from:
>> > http://www.ensembl.org/info/docs/api/core/core_tutorial.html
>> >
>> > Code:
>> > - $slice_adaptor->fetch_by_region( 'chromosome', '11', 123000000,
>> > 123000020, 'NCBI36' ); or
>> > - $slice_adaptor->fetch_by_region( 'chromosome', '11', 123000000,
>> > 123000020, 1, 'NCBI36' );
>> >
>> > Results when using ensembl website:
>> >> 11 dna:chromosome chromosome:NCBI36:11:123000000:123000020:1
>> >
>> > AAGAGTCCCACCAGCTCCAGG
>> >
>> >> 11 dna:chromosome chromosome:GRCh37:11:123000000:123000020:1
>> >
>> > TGCACTCCAGCCTGGGCAATG
>> >
>> > Thanks,
>> > Stef
>> > __________________
>> > http://www.fastmail.fm
>> >
>> >
>> > _______________________________________________
>> > Dev mailing list
>> > Dev at ensembl.org
>> > http://lists.ensembl.org/mailman/listinfo/dev
>>
>> __________________
>> http://www.fastmail.fm
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>
> --
> Javier Herrero, PhD
> Ensembl Compara Project Leader
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge - CB10 1SD - UK
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
>
> __________________
> http://www.fastmail.fm
>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
>




More information about the Dev mailing list