[ensembl-dev] Liftover script from NCBI36 to GRCh37

mag mr6 at ebi.ac.uk
Thu Sep 18 15:03:43 BST 2014


Hi Chris,

In our latest Ensembl release, the main human assembly is GRCh38.
This means that we provide mappings between GRCh38 and most previous 
assemblies (GRCh37, NCBI36, NCBI35, NCBI34)
However, we do not provide mappings between the older assemblies.

If you want mappings between GRCh37 and non-GRCh38 assemblies, we 
recommend using the GRCh37 databases.
These are available in release 75, so using the API version 75 would 
point you to the right databases.


Hope that helps,
Magali

On 17/09/2014 16:50, Chris Penkett wrote:
>
> Hi Ensembl developers,
>
> I've been using a script (see below) with the Ensembl API to do 
> liftover of coordinates between different genome versions, but it no 
> longer seems to work for moving NCBI36 to GRCh37.  Is it still 
> possible to do this conversion with the new genome reference?
>
> Output of script for different genome asssembly liftovers:
>
> % ./ens_db_lift.pl NCBI36 GRCh37 chr1:10000-20000
> Can't call method "map" on an undefined value at ./ens_db_lift.pl line 
> 42.
>
> % ./ens_db_lift.pl NCBI36 GRCh38 chr1:10000-20000
> 1, 10000, 20000 --> 1    20137    30137
>
> % ./ens_db_lift.pl GRCh37 GRCh38 chr1:10000-20000
> 1, 10000, 20000 --> 1    10001    20000
>
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use Data::Dumper;
>
> use lib "/src/ensembl/bioperl-1.2.3";
> use lib "/home/cjp64/src/ensembl/ensembl/modules";
>
> use Bio::EnsEMBL::DBSQL::DBAdaptor;
> use Bio::EnsEMBL::AssemblyMapper;
> use Bio::EnsEMBL::Mapper::Coordinate;
> use Bio::EnsEMBL::DBSQL::SliceAdaptor;
>
> my $from  = shift;
> my $to    = shift;
> my $coord = shift;
>
> my $species = "human";
> my $host    = 'ensembldb.ensembl.org';
> my $user    = 'anonymous';
>
> my $reg = 'Bio::EnsEMBL::Registry';
> $reg->load_registry_from_db(-host => $host, -user => $user);
>
> my $asma = $reg->get_adaptor($species, 'core', 'AssemblyMapper');
> my $csa  = $reg->get_adaptor($species, 'core', 'CoordSystem');
> my $sa   = $reg->get_adaptor($species, 'core', 'Slice');
>
> my $from_cs = $csa->fetch_by_name('chromosome', $from );
> die "Unknown coord system: $from\n" if not $from_cs;
> my $to_cs   = $csa->fetch_by_name('chromosome', $to);
> die "Unknown coord system: $to\n" if not $to_cs;
>
> my $mapper = $asma->fetch_by_CoordSystems($from_cs, $to_cs);
>
> $coord =~ /(.*?):(\d+)-(\d+)/;
>
> my ($chr, $start, $end) = ($1, $2, $3);
> $chr =~ s/chr//;
>
> my @res = $mapper->map($chr, $start, $end, 1, $from_cs);
> foreach my $res ( @res ) {
>   if ($res->isa( 'Bio::EnsEMBL::Mapper::Coordinate') ) {
>     my $chr_slice = $sa->fetch_by_seq_region_id($res->id);
>     print "$chr, $start, $end --> " . join("\t", 
> $chr_slice->seq_region_name, $res->start, $res->end) . "\n";
>   }
> }
>
> Best wishes,
> Chris
>





More information about the Dev mailing list