[ensembl-dev] No chromosomes in API for latest bos taurus

Thibaut Hourlier thibaut at ebi.ac.uk
Tue Jan 22 09:22:15 GMT 2019


Hi Simon,

The fix should work for all species which have chromosomes.

The toplevel coordinate system returns all sequences which are not part of a bigger sequence. This means chromosomes, unplaced scaffolds and unlocalised scaffolds.
The chromosome coordinate system return only the chromosomes.

For cow, if you use chromosome (karyotype), you will have 30 sequences whereas if you use toplevel you will have 2211 sequences as the assembly has some unplaced scaffolds.

Previously we were storing the assemblies using the contig - scaffold - chromosome coordinate systems. For many new assemblies the contigs are close to the chromosome size, so to load the assemblies in our databases more easily, we load the whole sequence instead of loading the contigs.
Chromosome 1 in cow has been submitted as 1 sequence

Hope this help
Thibaut

> On 21 Jan 2019, at 17:09, Simon Andrews <simon.andrews at babraham.ac.uk> wrote:
> 
> Thanks for the reply - that’s really helpful and I think gives me a work round for now.
>  
> We use this same script across a range of different species so I’d like to understand whether this is something we’d need to change permanently in the script, or if it’s something perculiar to this assembly.  Is there a different meaning for just using a “toplevel” coord system rather than a chromosome based one when the assembly appears to be chromosome based overall?
>  
> Thanks
>  
> Simonb
>  
> From: Dev <dev-bounces at ensembl.org <mailto:dev-bounces at ensembl.org>> On Behalf Of Kostas Billis
> Sent: 21 January 2019 14:52
> To: Ensembl developers list <dev at ensembl.org <mailto:dev at ensembl.org>>
> Subject: Re: [ensembl-dev] No chromosomes in API for latest bos taurus
>  
> Hi Simon, 
>  
>  
> I think your script looks good. 
>  
> For new cow annotation, we used a different loading system and we store the top-level sequences (primary assembly) in seq_region, for this reason coord_system table has only one entry - the primary assembly.
>  
> +-----------------+------------+------------------+------------+------+--------------------------------+
> | coord_system_id | species_id | name             | version    | rank | attrib                         |
> +-----------------+------------+------------------+------------+------+--------------------------------+
> |               1 |          1 | primary_assembly | ARS-UCD1.2 |    1 | default_version,sequence_level |
> +-----------------+------------+------------------+------------+------+--------------------------------+
> 1 row in set (0.00 sec)
>  
>  
> You can get all top-level sequences. For example, 
> Instead of: my @chr_slices = @{$db_adapter -> get_adaptor('slice') -> fetch_all('chromosome',undef,0,1)}
> Try: my @chr_slices = @{$db_adapter -> get_adaptor('slice') -> fetch_all('toplevel',undef,0,1)};
>  
> A way to get the chromosomes is via karyotype attribute. For example: 
>  
> my @karyo_slices = @{$db_adapter->get_SliceAdaptor->fetch_all_karyotype()};
>  
> foreach my $karyo (@karyo_slices) {
>   print "FOUND " , $karyo->name, "\n"
> }
>  
> warn "Found ", "kary " , scalar(@karyo_slices) , " and ",  scalar @chr_slices, " chromosomes for ", $db_adapter->species(), "\n";
>  
>  
>  
>  
> Please let me know if this works for you. 
>  
>  
> Thanks, 
> Kostas 
> 
> 
> On 21 Jan 2019, at 13:36, Simon Andrews <simon.andrews at babraham.ac.uk <mailto:simon.andrews at babraham.ac.uk>> wrote:
>  
> Something odd is happening in the API for the latest bos Taurus release.  When I try to get chromosome adapters I can’t find any.
>  
> This worked OK for the previous build and still works OK for other species in the current release.  The web site still shows it as a chromosome based assembly so I’m guessing something is wrong somewhere.
>  
> I’ve put a test script at the bottom - if someone could take a look at this I’d be grateful.
>  
> Thanks
>  
> Simon.
>  
> #!/usr/bin/perl
> use warnings;
> use strict;
> use Bio::EnsEMBL::Registry;
>  
> my $registry = 'Bio::EnsEMBL::Registry';
>  
> $registry->load_registry_from_db(
>     -host => 'ensembldb.ensembl.org <http://ensembldb.ensembl.org/>',
>     -user => 'anonymous'
>     );
>  
> my $db_adapter = $registry -> get_DBAdaptor("bos taurus","Core");
> #my $db_adapter = $registry -> get_DBAdaptor("mus musculus","Core");
>  
> warn "Genome is ", $db_adapter->species(), "\n";
>  
> my @chr_slices = @{$db_adapter -> get_adaptor('slice') -> fetch_all('chromosome',undef,0,1)};
>  
> warn "Found ", scalar @chr_slices, " chromosomes for ", $db_adapter->species(), "\n";
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
> The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk <http://www.babraham.ac.uk/terms>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>  
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
> The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk <http://www.babraham.ac.uk/terms>_______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190122/0ad0453a/attachment.html>


More information about the Dev mailing list