[ensembl-dev] Liftover script from NCBI36 to GRCh37

Chris Penkett cjp64 at cam.ac.uk
Wed Sep 17 16:50:01 BST 2014


Hi Ensembl developers,

I've been using a script (see below) with the Ensembl API to do liftover 
of coordinates between different genome versions, but it no longer seems 
to work for moving NCBI36 to GRCh37.  Is it still possible to do this 
conversion with the new genome reference?

Output of script for different genome asssembly liftovers:

% ./ens_db_lift.pl NCBI36 GRCh37 chr1:10000-20000
Can't call method "map" on an undefined value at ./ens_db_lift.pl line 42.

% ./ens_db_lift.pl NCBI36 GRCh38 chr1:10000-20000
1, 10000, 20000 --> 1    20137    30137

% ./ens_db_lift.pl GRCh37 GRCh38 chr1:10000-20000
1, 10000, 20000 --> 1    10001    20000


#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

use lib "/src/ensembl/bioperl-1.2.3";
use lib "/home/cjp64/src/ensembl/ensembl/modules";

use Bio::EnsEMBL::DBSQL::DBAdaptor;
use Bio::EnsEMBL::AssemblyMapper;
use Bio::EnsEMBL::Mapper::Coordinate;
use Bio::EnsEMBL::DBSQL::SliceAdaptor;

my $from  = shift;
my $to    = shift;
my $coord = shift;

my $species = "human";
my $host    = 'ensembldb.ensembl.org';
my $user    = 'anonymous';

my $reg = 'Bio::EnsEMBL::Registry';
$reg->load_registry_from_db(-host => $host, -user => $user);

my $asma = $reg->get_adaptor($species, 'core', 'AssemblyMapper');
my $csa  = $reg->get_adaptor($species, 'core', 'CoordSystem');
my $sa   = $reg->get_adaptor($species, 'core', 'Slice');

my $from_cs = $csa->fetch_by_name('chromosome', $from );
die "Unknown coord system: $from\n" if not $from_cs;
my $to_cs   = $csa->fetch_by_name('chromosome', $to);
die "Unknown coord system: $to\n" if not $to_cs;

my $mapper = $asma->fetch_by_CoordSystems($from_cs, $to_cs);

$coord =~ /(.*?):(\d+)-(\d+)/;

my ($chr, $start, $end) = ($1, $2, $3);
$chr =~ s/chr//;

my @res = $mapper->map($chr, $start, $end, 1, $from_cs);
foreach my $res ( @res ) {
   if ($res->isa( 'Bio::EnsEMBL::Mapper::Coordinate') ) {
     my $chr_slice = $sa->fetch_by_seq_region_id($res->id);
     print "$chr, $start, $end --> " . join("\t", 
$chr_slice->seq_region_name, $res->start, $res->end) . "\n";
   }
}

Best wishes,
Chris

-- 
NIHR BioResource Rare Diseases
Department of Haematology
University of Cambridge
NHSBT Building
Long Road
Cambridge CB2 0PT





More information about the Dev mailing list