[ensembl-dev] get sequence from different build

ian Longden ianl at ebi.ac.uk
Fri Sep 24 15:28:25 BST 2010


Ah yes i did my test on another region where there was sequence:-
here is the full code and example:-

-----------------------------------------------------------------------------------------------------
use Bio::EnsEMBL::Registry;
my $s = 'Human';      # species-name
my $r = 'chromosome'; # slice region
my $c = 11;           # chromosome


#my $p = 123000000;    # position
my $p = 60001;
# ======================================================
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous'
);


my $sa = $registry->get_adaptor( $s, 'Core', 'Slice' );

# ======================================================

my $s36 = $sa->fetch_by_region( $r, $c, $p, $p+20, 1, 'NCBI36' );
my $s37 = $sa->fetch_by_region( $r, $c, $p, $p+20, 1, 'GRCh37' );

print "Seq for 37:-\n";
print $s37->seq."\n";

my $chr_projection = $s36->project('Chromosome','GRCh37');
my $seq = "";
foreach my $segment (@$chr_projection) {
   my ($start, $end, $chr) = @$segment;
   $seq .= $chr->seq;
}

print "Seq for 36:-\n";
print $seq."\n";

------------------------------------------------------------------------------------------------
Giving:-

Seq for 37:-
GAATTCTACATTAGAAAAATA
Seq for 36:-
AGGCAGAGGTCAAAGTGAGCC



Cheers,
Ian.


On Fri, Sep 24, 2010 at 3:17 PM, Hiram Clawson <hiram at soe.ucsc.edu> wrote:
> The first 10,000 bases of GRCh37 is all "N" - telomere gap.
> It is actual sequence in build 36, when translated to GRCh37
> starts at position 10,001
>
> --Hiram
>
> On Thu, Sep 23, 2010 at 9:58 AM,  <mailsvl at fastmail.fm> wrote:
>> Hi Javier,
>>
>> What I want is your first example, to get for a region (eg
>> chr1:1230-1240):
>> 1) the seq of build 36
>> 2) the seq of build 37
>>
>> But the code below always gives me 'NNNNNN' for the 36 build, try this:
>>
>> # ======================================================
>> my $s = 'Human';      # species-name
>> my $r = 'chromosome'; # slice region
>> my $c = 11;           # chromosome
>> my $p = 123000000;    # position
>>
>> # ======================================================
>> my $registry = 'Bio::EnsEMBL::Registry';
>> $registry->load_registry_from_db(
>>  -host => 'ensembldb.ensembl.org',
>>  -user => 'anonymous'
>> );
>> my $sa = $registry->get_adaptor( $s, 'Core', 'Slice' );
>>
>> # ======================================================
>> my $s36 = $sa->fetch_by_region( $r, $c, $p, $p+20, 1, 'NCBI36' );
>> my $s37 = $sa->fetch_by_region( $r, $c, $p, $p+20, 1, 'GRCh37' );
>> print $s36->seq."\n";
>> print $s37->seq."\n";
>>
>> /code
>>
>> Results in:
>>> NNNNNNNNNNNNNNNNNNNNN
>>> TGCACTCCAGCCTGGGCAATG
>>
>> Using version ensembl version 59, 58 or 57 all failed...
>>
>> -Stef
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev
>




More information about the Dev mailing list