[ensembl-dev] No chromosomes in API for latest bos taurus

Simon Andrews simon.andrews at babraham.ac.uk
Wed Jan 23 08:09:32 GMT 2019


That’s great.  I’ll update the code for all future genomes we process then.

Simon.

From: Dev <dev-bounces at ensembl.org> On Behalf Of Thibaut Hourlier
Sent: 22 January 2019 09:22
To: Ensembl developers list <dev at ensembl.org>
Subject: Re: [ensembl-dev] No chromosomes in API for latest bos taurus

Hi Simon,

The fix should work for all species which have chromosomes.

The toplevel coordinate system returns all sequences which are not part of a bigger sequence. This means chromosomes, unplaced scaffolds and unlocalised scaffolds.
The chromosome coordinate system return only the chromosomes.

For cow, if you use chromosome (karyotype), you will have 30 sequences whereas if you use toplevel you will have 2211 sequences as the assembly has some unplaced scaffolds.

Previously we were storing the assemblies using the contig - scaffold - chromosome coordinate systems. For many new assemblies the contigs are close to the chromosome size, so to load the assemblies in our databases more easily, we load the whole sequence instead of loading the contigs.
Chromosome 1 in cow has been submitted as 1 sequence

Hope this help
Thibaut


On 21 Jan 2019, at 17:09, Simon Andrews <simon.andrews at babraham.ac.uk<mailto:simon.andrews at babraham.ac.uk>> wrote:

Thanks for the reply - that’s really helpful and I think gives me a work round for now.

We use this same script across a range of different species so I’d like to understand whether this is something we’d need to change permanently in the script, or if it’s something perculiar to this assembly.  Is there a different meaning for just using a “toplevel” coord system rather than a chromosome based one when the assembly appears to be chromosome based overall?

Thanks

Simonb

From: Dev <dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>> On Behalf Of Kostas Billis
Sent: 21 January 2019 14:52
To: Ensembl developers list <dev at ensembl.org<mailto:dev at ensembl.org>>
Subject: Re: [ensembl-dev] No chromosomes in API for latest bos taurus

Hi Simon,


I think your script looks good.

For new cow annotation, we used a different loading system and we store the top-level sequences (primary assembly) in seq_region, for this reason coord_system table has only one entry - the primary assembly.

+-----------------+------------+------------------+------------+------+--------------------------------+
| coord_system_id | species_id | name             | version    | rank | attrib                         |
+-----------------+------------+------------------+------------+------+--------------------------------+
|               1 |          1 | primary_assembly | ARS-UCD1.2 |    1 | default_version,sequence_level |
+-----------------+------------+------------------+------------+------+--------------------------------+
1 row in set (0.00 sec)


You can get all top-level sequences. For example,
Instead of: my @chr_slices = @{$db_adapter -> get_adaptor('slice') -> fetch_all('chromosome',undef,0,1)}
Try: my @chr_slices = @{$db_adapter -> get_adaptor('slice') -> fetch_all('toplevel',undef,0,1)};

A way to get the chromosomes is via karyotype attribute. For example:

my @karyo_slices = @{$db_adapter->get_SliceAdaptor->fetch_all_karyotype()};

foreach my $karyo (@karyo_slices) {
  print "FOUND " , $karyo->name, "\n"
}

warn "Found ", "kary " , scalar(@karyo_slices) , " and ",  scalar @chr_slices, " chromosomes for ", $db_adapter->species(), "\n";




Please let me know if this works for you.


Thanks,
Kostas



On 21 Jan 2019, at 13:36, Simon Andrews <simon.andrews at babraham.ac.uk<mailto:simon.andrews at babraham.ac.uk>> wrote:

Something odd is happening in the API for the latest bos Taurus release.  When I try to get chromosome adapters I can’t find any.

This worked OK for the previous build and still works OK for other species in the current release.  The web site still shows it as a chromosome based assembly so I’m guessing something is wrong somewhere.

I’ve put a test script at the bottom - if someone could take a look at this I’d be grateful.

Thanks

Simon.

#!/usr/bin/perl
use warnings;
use strict;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org<http://ensembldb.ensembl.org/>',
    -user => 'anonymous'
    );

my $db_adapter = $registry -> get_DBAdaptor("bos taurus","Core");
#my $db_adapter = $registry -> get_DBAdaptor("mus musculus","Core");

warn "Genome is ", $db_adapter->species(), "\n";

my @chr_slices = @{$db_adapter -> get_adaptor('slice') -> fetch_all('chromosome',undef,0,1)};

warn "Found ", scalar @chr_slices, " chromosomes for ", $db_adapter->species(), "\n";
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190123/b872c653/attachment.html>


More information about the Dev mailing list